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ABSTRACT 

Our  research  concerns  optical  data  processing  for  missile  guidance  and  target  recognitioj 
It  uses  pattern  recognition  techniques  with  an  increased  use  of  knowledge  base,  inference  machir 
and  associative  processor  techniques.  Our  Year  3  work  concerns  new  algorithms,  real  time  an 
practical  realizations  of  such  systems,  and  new  initial  work  on  associative  processors,  symboli 
rule-based  processors  and  directed  graph  processors  (with  new  attention  to  unique  optic: 
realizations  of  such  systems). 
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1.  INTRODUCTION 

The  work  in  the  past  year  of  this  grant  (1  January  1987  -  31  December  1987)  and  its  no- 
cost  extension  (January-March  1988)  produced  results  on  various  new  optical  pattern  recognition 
algorithms,  real  time  laboratory  results,  new  practical  computer  generated  hologram  recording 
techniques,  and  four  new  areas  of  potential  work  in  optical  artificial  intelligence  (these  include 
associative  processors,  symbolic  and  rule  based  systems,  as  well  as  directed  graph  optical 
processors). 

In  this  last  year,  the  Principal  Investigator  (PI)  and  our  AFOSR  optical  data  processing 
effort  were  quite  visible  within  the  community.  The  PI  served  on  the  Defense  Science  Board 
Task  Force  on  Image  Recognition,  gave  2  invited  talks  in  non-optical  processing  conferences 
[1,2],  an  invited  survey  paper  on  optical  pattern  recognition  and  artificial  intelligence  [3],  served 
on  a  NASA  review  committee  on  photonics,  participated  in  several  panel  discussions,  produced  a 
book  chapter  on  optical  feature  extraction  (4;,  an  encyclopedia  article  [7j,  plus  numerous  papers 
and  conference  presentations.  This  ends  our  pattern  recogp'tion  AFOSR  work.  The  results  we 
have  obtained  should  be  of  use  in  many  future  aspects  of  optical  processing  for  image  and  scene 
analysis.  These  results  are  well-documented,  due  to  our  conscientious  publication  effort.  These 
works  have  also  been  published  in  various  non-optical  journals  to  provide  wider  exposure  for  this 
technology. 

We  now  highlight  our  research  results  in  this  third  year  of  our  work.  Each  result  is  more 
fully  detailed  in  subsequent  chapters,  as  noted.  New  pattern  recognition  algorithms  and 
architectures  devised  included:  new  Hough  transform  techniques  for  distortion-invariant  pattern 
recognition  [5]  were  devised  and  demonstrated  (Chapter  2  details  these),  a  large  1000  class 
pattern  recognition  problem  was  addressed  [6]  with  attractive  initial  results  (Chapter  3  details 
‘Ns  work),  and  a  new  siring  com-  processor  [8j  (detailed  in  Chapter  4)  was  advanced.  Our 
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second  thrust  area  provided  real  time  laboratory  results  of  distortion-invariant  pattern 
recognition  using  a  liquid  crystal  television  [9]  (Chapter  5  details  this  work)  and  practical 
computer  generated  hologram  (CGH)  synthesis  techniques  using  a  laser  printer  were  advanced 
10  (Chapter  6  details  this  work).  Our  third  major  research  area  involved  optical  artificial 
in' diligence  processors.  This  work  provided  new  results  in  associative  processors,  symbolic 
processors,  rule  based  and  directed  graph  processors.  This  included:  new  error  correction 
associative  processor  concepts  [ill  as  detailed  in  Chapter  7,  new  associative  memory  mapping 
realizations  of  an  optical  feature  space  [12]  (Chapter  8),  new  heteroassociative  memory  processor 
performance  measures  and  recollection  vector  encoding  choices  [13]  (Chapter  9),  symbolic  and 
rule-based  processors  [14]  as  detailed  in  Chapter  10,  and  directed  graph  optical  processor 
concepts  and  realizations  [15]  as  detailed  in  Chapter  11.  These  last  5  items  represent  major  new 
optical  processing  contributions  to  knowledge  processing.  Chapter  12  provides  full 

documentation  of  our  publications,  presentations  given,  and  theses  produced  related  to  this 
AFOSR  effort.  The  90  papers  and  over  100  technical  talks  presented  in  the  three  years  of  this 
program  represent  a  quite  major  and  significant  contribution  to  optical 

data/information/knowledge  processing  research  and  to  directions  for  future  research  in  this 


area. 
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2.  HOUGH  TRANSFORM  FOR  PATTERN 
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Hough  Space  Transformations  for  Discrimination  and 
Distortion  Estimation 

Raghuram  Krishnapuram  and  David  Casasi m 

Department  of  Electrical  and  Computer  Engineering.  Carnegie  Mellon  i  nirer.utv. 
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A  new  use  of  the  Hough  transform  space  defined  for  straight  lines  i>,  described.  The  Hough 
space  is  used  directly  with  new  efficient  distortion  parameter  transformations  and  template 
matching.  This  technique  allows  multiclass  discrimination,  intra-class  distortion  invariant 
recognition,  and  multiple  distortion  parameter  estimation.  A  new  hierarchical  distortion 
parameter  search  method  and  spatial  quantization  in  Hough  space  make  realization  of  this 
technique  very  attractive.  Performance  of  our  algorithm  on  aircraft  imagery  and  in  the 
presence  of  noise  is  provided  wn~  Acadvinu  Piv^s.  iru 


l.  INTRODUCTION 


The  Hough  transform  [1.  2],  as  suggested  originally,  is  a  method  for  detecting 
straight-line  segments  in  an  input  image.  This  concept  has  been  extended  to  include 
other  analytically  representable  curves  such  as  circles  and  ellipses  [3].  It  was  further 
generalized  to  include  arbitrary  shapes  and  even  three-dimensional  (3-D)  objects 
[4.  5].  These  extensions  are  commonly  referred  to  as  generalized  Hough  transforms. 
The  earlier  versions  of  the  generalized  Hough  transforms  [6]  required  the  computa¬ 
tion  of  the  gradient  of  each  edge  element  and  their  storage  in  the  form  of  a  table.  To 
reduce  the  computational  burden.  Davis  [7]  suggested  a  hierarchical  Hough  trans¬ 
form  in  which  subpatterns  of  the  image  rather  than  the  edge  elements  (pixels)  were 
used  as  the  basic  units.  The  implementation  of  this  approach  is  quite  complex  since 
we  must  deal  with  patterns  rather  than  pixels. 

Ballard  and  Sabbah  [4]  used  a  similar  concept  employing  line  segments  rather 
than  edge  elements.  They  also  suggested  a  different  type  of  generalized  Hough 
transform  for  detecting  one  type  of  object  of  arbitrary  shape  with  scale,  rotation, 
and  translation  differences  present.  They  assume  that  the  object  boundary  can  he 
approximated  by  straight-line  segments  and  that  a  lis  of  the  exact  lengths,  orienta¬ 
tions  and  positions  of  all  object  boundary  segments  (with  respect  to  a  reference 
point  on  the  object)  is  available.  It  is  difficult  but  possible  to  obtain  such  a  list  for 
the  model  of  the  object  being  searched  for.  However,  it  is  computationally  burden¬ 
some  to  accurately  obtain  such  a  list  for  an  input  image,  especially  when  noise  is 
present.  Implementing  this  efficiently  will  probably  require  a  special  symbolic 
language  to  handle  the  lists,  especially  when  the  lists  are  complicated.  Detecting 
peaks  in  the  Hough  domain  is  difficult,  especially  when  bias  and  noise  are  present 
[8],  It  is  difficult  to  quantify  how  well  any  such  method  will  work  when  extraction  of 
line  segments  in  the  input  image  is  not  easily  achieved.  The  performance  of  such 
methods  in  the  presence  of  noise  is  also  not  easily  analyzed.  All  of  these  methods 
presuppose  that  the  type  of  object  being  searched  for  is  known  in  advance,  i.e..  they 
are  only  applicable  to  one-class  problems  and  do  not  easily  provide  discrimination 
against  other  object  types. 
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h('  1  Image  plane  to  Hough  transform  plane  mapping  (a)  Points  I  ami  l<  in  the  image  plane  are 
mapped  to  (h)  curves  ^  and  h  in  the  Hough  transform  plane  The  line  in  tai  maps  to  the  point  I  p'.  «'  i 
ill  (hi 


The  basic  Hough  transform  for  straight  lines  can  he  readily  implemented  digitalis 
if  the  conventional  parameterization  in  terms  of  the  normal  distance  />  and  angle  8 
for  straight  lines  is  used.  Figure  1  shows  this  classic  image  plane  /(  v.  t  I  to  Hough 
transform  plane  H(6.  p)  mapping  for  a  line.  Fach  point  (  v.  v)  in  /(  v.  t  )  is  mapped 
to  a  sinusoidal  curve  in  H(8.  p)  given  by 

v  cos  8  +  v  sin  0  =  p .  ( 1 ) 

This  sinusoidal  curve  gives  the  p  and  8  parameters  of  all  the  straight  lines  passing 
through  the  point  (  v.  r).  Fach  point  on  the  straight-line  maps  to  a  different 
sinusoidal  curve  (e.g..  A  and  B  in  Fig.  lb)  given  by  F.q.  (1).  All  these  curses 
intersect  at  a  point  in  the  Hough  space  and  this  point  defines  ihe  p  and  8 
parameters  for  the  straight  line  shown  in  Fig.  la. 

The  calculation  of  this  Hough  transform  requires  only  simple  multiplications 
involving  trigonometric  functions.  Since  the  same  multiplications  are  performed  for 
every  edge  pixel  in  the  image,  computation  of  the  Hough  transform  can  he  achieved 
in  parallel.  The  results  are  accumulated  in  the  U{0.  /> )  Hough  array.  It  lias  also 
been  shown  [9]  that  the  Hough  transform  is  a  special  case  of  the  Radon  transform 
and  that  it  can  also  easily  he  computed  using  optical  techniques  at  video  rates 
[10.  11).  This  transform  and  H(0.  p)  i-  thus  very  attractive  for  the  low-level 
representation  of  images  of  objects. 

This  paper  describes  an  alternative  approach  to  estimation  of  the  scale  rotation, 
and  translation  of  an  input  image  with  respect  to  a  reference  image.  It  usv.>  the  basic 
straight-line  Hough  transform  space.  The  proposed  method  is  unique  because  it  is 
capable  of  handling  multiclass  problems.  Our  approach  is  also  original  because  the 
matching  is  performed  directly  in  the  Hough  space.  This  dilfers  significantly  from 
the  other  approaches  in  which  Hough  techniques  (i.e..  accumulating  votes  in  a  2-0 
or  3-D  parameter  array)  are  used  for  matching  tables.  In  Section  2.  we  review  the 
ease  with  which  one  can  obtain  the  Hough  transform  of  the  input  image.  Section  2 
also  discusses  the  various  applications  and  realizations  of  the  Hough  technique  and 
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the  advantages  and  disadvantages  ol  each.  In  Section  2  we  detail  our  use  of  the 
Hough  space  for  distortion  invariance.  This  involves  new  transformations  applied  to 
the  Hough  space.  The  ease  with  which  the  transformations  can  be  achieved  is 
discussed.  A  hierarchical  matching  technique  is  detailed  in  Section  4  that  signifi¬ 
cantly  reduces  the  computations  required  to  determine  the  object  class  and  tire 
object  orientation.  The  image  database  used  and  the  results  obtained  are  then 
advanced  (Sect.  5)  and  noise  performance  is  also  provided.  1  malls  Section  6 
summarizes  our  work  and  advances  our  conclusions. 

;  TUI  HOU.ll  DOMAIN  \S  A  J-D  IT.AM.IRI  SI’At  I 

The  algorithms  suggested  thus  far  to  estimate  the  scale,  rotation,  and  translation 
parameters  of  an  input  image  using  the  Hough  technique  require  the  compilation  of 
some  form  of  a  list  or  table.  Hus  list  can  be  precomputed,  as  in  [4).  or  dsnamicalls 
computed,  as  in  |5|.  The  A’-table  |6]  requires  the  storage  of  a  list  of  the  gradients  of 
all  edge  elements  and  their  positions  with  respect  to  a  reference  point  for  the  object 
to  be  searched  for.  Tor  an  unknown  input  image,  the  location  of  the  reference  point 
must  he  determined  To  achieve  this,  an  accumulator  or  Hough  arras  is  created  with 
each  element  denoting  a  possible  location  of  the  reference  point  in  the  input  image. 
The  list  from  the  model  is  used  to  compute  the  possible  locations  of  the  reference 
point  ssith  respect  to  each  edge  element  in  the  input  image,  svhere  each  possible 
location  corresponds  to  a  particular  translation  and  rotation  of  the  object.  Thus, 
each  edge  element  in  the  input  image  votes  for  all  possible  localisms  of  the  reference 
point  and  these  votes  are  accumulated  in  the  Hough  arras.  When  the  voting  process 
has  been  completed  for  all  edge  elements,  the  peaks  in  the  arras  indicate  the 
possible  locations  of  the  reference  point  in  the  input  image  and  thus  denote  the 
object's  possible  location.  A  similar  approach  using  line  segments  rather  than  edge 
elements  has  been  suggested  (4j. 

In  both  cases,  if  the  scale,  rotation  and  translation  parameters  of  the  object  tire  to 
be  estimated  simultaneously,  a  4-D  Hough  arras  is  needed.  In  this  arras .  two 
dimensions  denote  the  two  translation  parameters  and  the  remaining  two  dimen¬ 
sions  denote  rotation  and  scale.  This  significantly  increases  the  computational 
complexity  and  the  memory  requirements.  Peak  detection  can  he  vers  difficult  in 
such  a  4-D  arras  [Sj  since  we  must  deal  with  hyper-surfaces.  I'o  overcome  some  of 
these  problems,  a  two-level  approach  has  been  recommended  in  [4],  ir.  which  the 
scale  and  orientation  are  estimated  first  (using  a  2-D  Hough  array)  and  then 
translation  is  estimated  (in  a  second-level  2-D  Hough  array).  The  digital  implemen¬ 
tation  of  these  methods  is  straightforward  and  can  be  realized  in  parallel  [12|.  given 
sufficient  hardware  and  once  the  list  has  been  obtained  from  the  model  and  the  line 
segment  information  has  been  extracted  from  the  input  image.  (  Accurate  calculation 
of  the  line  segment  data  from  the  input  image  can  be  very  difficult.) 

To  reduce  the  mentor  requirements  and  computational  burden,  another  ap¬ 
proach  has  been  suggested  by  Li.  Levin,  and  LeMastcr  |I3|.  Here  the  voting  process 
is  carried  out  only  in  those  parts  of  the  Hough  array  where  peaks  are  likely  to  occur. 
This  method,  however,  applies  only  to  situations  in  which  an  element  in  the  input 
image  votes  on  a  hvperplane  (and  not  on  the  more  general  hypersurface)  in  the 
parameter  (Hough)  space.  It  is  also  not  known  how  well  the  method  will  perform 
w  hen  the  peaks  are  diffused. 
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Several  problems  associated  with  these  prior  methods  are  worth  noting.  As 
Ballard  and  Sabbah  point  out  [4j.  the  position  information  is  ignored  while 
estimating  the  scale  and  orientation  in  the  first  level.  As  a  result,  peaks  can  occur  in 
the  accumulator  array  due  to  line  segments  in  the  input  image  that  do  not  even 
touch  each  other  and  due  to  line  segments  that  do  not  even  lie  near  each  other.  Thus 
many  false  peaks  can  and  do  arise  in  the  accumulator  array.  Another  potential 
problem  [8]  with  these  methods  is  the  detection  of  the  peaks  in  the  Hough  array. 
Because  of  the  inherent  noise  and  bias  present  in  the  Hough  transform,  sharp  peaks 
rarely  occur,  rather  all  peaks  are  distorted  and  diffused  (smeared).  Thus,  we  require 
the  detection  of  local  peaks  rather  than  global  peaks,  and  hence,  sophisticated  peak 
threshold  methods.  This  problem  becomes  much  worse  when  the  dimensionality  of 
the  Hough  array  is  large,  since  we  must  then  deal  with  hypersurfaces.  Another 
major  problem  with  these  prior  methods  is  that  they  require  the  detection  of  the 
gradients  and  the  positions  of  the  edge  elements  in  the  input  image,  prior  to  the 
application  of  the  Hough  technique.  If  line  segments  are  used,  their  orientations  and 
positions  are  required.  This  image  preprocessing  often  requires  special  edge-follow¬ 
ing  and  line-fitting  algorithms  which  can  be  inaccurate  and  tedious.  The  final  and 
quite  a  major  problem  with  ail  of  these  methods  is  that  they  are  object-specific  and 
must  thus  be  reformulated  if  a  new  object  is  to  be  searched  for. 

In  this  paper,  we  describe  a  different  usage  of  the  Hough  transform  to  overcome 
these  problems.  In  what  follows,  it  should  be  understood  that  by  Hough  transform 
(HT).  we  mean  the  basic  Hough  transform  defined  for  straight  lines. 

A  simple  and  fast  method  of  computing  the  lengths  and  orientations  of  the  line 
segments  in  (he  input  image  is  to  use  the  Hough  transform  itself.  The  peaks  in  the 
Hough  transform  give  the  strengths  and  orientations  of  all  lines  in  the  input  image. 
However,  it  suffers  from  the  same  problem  as  do  the  earlier  methods  since  peak 
detection  can  again  be  difficult.  Therefore,  our  new  suggestion  is  not  to  extract  any 
information  from  the  Hough  transform,  but  to  simply  use  the  Hough  space  as  it  is. 

The  basic  idea  of  our  approach  is  to  approximate  an  object  by  a  set  of  line 
segments  and  to  describe  these  segments  by  a  given  2-1)  pattern  in  the  Hough 
domain.  Thus,  two  similar  objects  would  have  similar  Hough  transforms  and  two 
different  objects  would  have  dilferent  Hough  transforms.  If  the  object  is  scaled, 
rotated,  or  translated,  the  Hough  transform  will  change  and  distort.  However,  as  we 
detail  in  Section  3.  it  is  possible  to  define  new  transformations  in  the  Hough  domain 
that  can  remove  these  distortions  and  reconstruct  the  Hough  transform  of  the 
original  object  in  the  reference  orientation.  When  this  is  done,  a  simple  template 
matching  with  the  Hough  transforms  of  dilferent  reference  objects  determines  if  the 
input  object  is  a  distorted  version  of  a  given  object.  It  also  determines  the  class  of 
the  object  and  its  distortion  parameters.  This  method  can  thus  he  used  to  dis¬ 
criminate  between  dilferent  types  of  objects  (from  the  similarity  of  the  template 
matches  of  their  respective  Hough  transforms). 

Eight  distinct  advantages  of  this  approach  are  now  noted.  (1)  It  does  not  require 
extracting  orientation  and  position  information  of  edge  elements  or  the  lengths  and 
orientations  of  line  segments  in  the  input  image.  (2)  We  do  not  need  to  detect  the 
peaks  in  the  Hough  domain.  The  inherent  Hough  bias  will  reduce  our  discrimina¬ 
tion  capability,  but  it  is  not  a  serious  problem  unless  the  two  objects  are  verx 
similar.  (3)  This  technique  uses  only  a  2-1)  Hough  space  and  thus  there  is  no 
concern  with  hypersurfaces.  As  a  result.  (4)  real-time  computation  is  possible,  and 
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Fig.  2.  Position  vector  P  unit  vector  a<0)  and  projection  p  as  dclined  h\  F.qs  (2i  and  ( ') 


(5)  memory  requirements  are  small.  Memory  requirements  can  he  further  reduced 
by  coarsely  discretizing  the  parameters  of  the  Hough  space.  Because  we  use  the 
Hough  space  itself,  considerable  quantization  is  allowed.  (6)  By  using  multiple 
Hough  space  reference  patterns,  this  method  can  be  used  for  multi-class  problems. 
(7)  The  use  of  this  Hough  space  as  a  2-D  pattern  in  a  correlator  is  attractive  and 
allows  shift  invariance.  (8)  Last,  this  approach  can  he  easily  extended  to  the 
recognition  of  3-D  range  images  and  to  the  detection  of  3-D  orientation  and 
translation.  This  can  be  achieved  without  increasing  the  dimensionality  of  the 
Hough  space  (as  we  will  detail  in  a  future  publication). 

1  HOUCiH  SPACK  DISTORTION  TRANSFORMATIONS 

In  this  section,  we  present  a  vector  description  of  the  Hough  transtorm  for 
distorted  objects.  Our  Hough  space  distortion  transforms  then  directly  follow  . 

3.1.  Vector  Description 

In  this  approach,  each  point  (  v.  y)  in  the  image  is  represented  bv  a  position 
vector  P  =  vi  +  yj  from  the  origin  as  shown  in  Fig.  2.  Here  i  and  j  are  unit  vectors 
along  the  x  and  y  directions,  respectively.  The  point  P  shown  can  lie  on  many 
(theoretically  an  infinite  number  of)  lines  that  pass  through  it.  Hach  of  these  straight 
iines  can  be  characterized  by  a  unit  vector  a(0)  and  a  magnitude  p.  The  unit  vector 
a(0)  extends  from  the  origin  perpendicular  to  the  line  and  at  an  angle  6  with  respect 
to  the  positive  x  axis  and  p  is  the  shortest  projection  distance  from  the  origin  to  the 
line.  The  unit  vector  is  described  by 

a  (0)  =  (cos  0  )i  +  (sin  0  )j  (2) 

and  the  projection  is  detined  by 


P  •*(«)=/».  (3) 

By  varying  6  and  performing  the  required  vector  inner  products  in  (3).  we  can  easily 
generate  the  a(0)  vectors  and  the  corresponding  p  values  for  all  possible  straight 
lines  passing  through  a  particular  point  P. 
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We  consider  only  a  finite  number  of  8  values  between  0  and  2tt.  Thus,  the 
process  described  above  generates  a  finite  list  of  (8.  p )  pairs  that  characterize  the 
corresponding  straight  lines  passing  through  P.  The  point  P  is  said  to  "  vote"  for  all 
of  those  (8.  p)  pairs  in  the  Hough  space.  To  represent  the  Hough  space  as  a  finite 
2-D  array,  we  discretize  the  values  of  p  also.  When  the  votes  for  all  (8.  p)  pairs 
have  been  accumulated  for  all  points  or  edge  elements  in  the  input  image,  then  the 
result  is  the  discrete  Hough  transform  of  the  input  image.  We  assume  that  p  is 
positive.  If  P  ■  a(0)  <  0.  we  ignore  the  corresponding  ( 8.  p)  vote,  since  this  implies 
P  *  a(0  +  77)  >  0  and  that  the  associated  vote  would  he  counted  at  (8  +  it.  p).  If 
p  -  0.  this  corresponds  to  a  line  through  the  origin  and  for  this  case.  (8.  p)  and 
(8  +  tt .  p)  represent  the  same  straight  line.  Thus,  we  need  consider  8  values  only 
between  0  and  ir  for  the  top  />  =  0  row  in  our  plots  and  computations. 

3.2.  Hough  Transform  of  a  Scaled  Image 

Let  Ifx.  y )  be  a  scaled  version  of  /(  v.  y)  with  scale  factor  v.  such  that  a  point  P 
at  (.v.  y)  maps  to  a  point  P,  at  (.v/.r.  y/.v).  Since  P.  •  a(0)  =  p/s.  the  votes  that 
occurred  at  (8.  p)  in  the  original  Hough  transform  now  occur  at  ( 8 .  p/s)  for  this 
scaled  object.  Thus,  the  Hough  transform  is  compressed  or  expanded  along  the  p 
axis  only,  depending  on  whether  v  >  1  or  ,v  <  !.  The  Hough  transform  H  i  8.  p)  of 
the  scaled  image  If  x.  y)  is  thus  related  to  the  Hough  transform  H(8.  p)  of  the 
original  image  by 


H.{8.  p/s )  =  H(8.  p).  (4) 

The  above  equation  can  thus  be  used  to  reconstruct  IU8.  p)  from  II /8.  p)  as  we 
detail  later. 

3.3.  Hough  Transform  of  a  Rotated  Image 

Let  lr(x.  y)  be  the  original  image  rotated  in  the  image  plane  by  an  angle  <y.  In 
Fig.  3.  we  show  one  point  P  on  the  original  object  and  the  associated  point  P,  on  the 
rotated  object.  In  polar  coordinates.  P  lies  at  {r.f)  and  P,  lies  at  (r.  \f.  +  £).  Since 


Pig.  3.  A  point  P  on  the  object,  and  its  position  (Pr )  when  the  object  is  rotated  about  the  on^in  h\  an 
an^le  cj> 
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TTo.  4.  A  point  P  on  the  object,  and  it.s  position  Pf  when  the  object  is  translated  b\  T 


P  •  a(0)  =  p  =  P,  •  a(0  +  <f>).  it  follows  that  the  votes  at  (8.  p)  in  the  original 
H(6,  p)  now  occur  at  ( 8  +  <f>.  p)  in  the  Hough  transform  Hr(8.  p)  of  /,(  v.  r).  The 
new  and  original  transforms  are  thus  related  by 

Hr(  8  +  4>.  p)  =  H(8,  p).  (5) 

To  obtain  the  original  Hough  transform  from  Hr{8.  p)  of  the  rotated  image,  we 
need  merely  shift  the  Hough  array  horizontally  by  an  amount  equal  to  the  rotation 
<t>.  This  shift  is  a  circular  shift  since  the  points  (8.  p)  and  ( 0  +  2tr.  p )  are  equivalent 
in  the  Hough  domain. 

3.4.  Hough  Transform  of  a  Translated  Image 

Let  /,( a.  r)  be  the  image  obtained  by  translating  the  object  by  (a.  h)  and  let 
H,(8.  p)  be  its  Hough  transform.  A  point  P  in  the  original  image  will  now  lie  at 
P,  =  P  +  T,  where  the  translation  vector  T  =  a\  +  hj  is  shown  in  Fig.  4.  We  let  the 
projection  magnitude  be  P  •  a(0)  =  p  for  a  line  corresponding  to  an  angle  8.  Then, 
the  projection  magnitude  for  the  translated  point  is  computed  as 

P,  •  a(0)  =  (P  +  T)  •  a(0)  =  P  •  a(0)  +  T-  a(0) 

=  p  +  (c/i  +  h))  •  (cos  8\  +  sin  0j) 

=  p  +  a  cos  $  +  bsinO  —  p  +  t  eos(  8  -  a).  ( 6a ) 


where 


t  =  (a2  +  h2)1'":  a  =  tan  '(/>/«).  (6b) 

The  second  half  of  Eq.  6a  follows  from  a  trigonometric  identity.  We  hereafter 
describe  translations  by  the  parameters  t  and  a.  To  evaluate  and  interpret  (6),  we 
consider  two  cases  separately. 

Case  I.  p  +  t  cos( 8  -«)>(). 

[n  this  case,  if  the  point  P  voted  for  a  point  {8,  p)  in  the  Hough  domain,  the  same 
vote  would  occur  at  (8.  p  +  rcos(0  -  a))  in  H,{8.  p).  Therefore,  the  elements  of 
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the  column  corresponding  to  8  in  the  original  Hough  array  are  shifted  along  the 
positive  p  axis  by  an  amount  equal  to  tcos(B  -  a). 

Case  2.  p  +  t  cos(#  -  a)  <  0. 

In  this  case  the  vote  does  not  occur  at  (6,  p  +  t  cos(8  -  a))  since  p  + 
t  cos(8  —  a)  <  0.  (Recall  that  in  the  Hough  space,  p  >  0.)  However,  this  implies 
that  —  P,  •  a(<?)  —  P,  *  a(8  +  tt)  =  ~(p  +  t  cos(8  -  a))  >  0  and  therefore  the  vote 
would  be  entered  at  (8  +  it,  -p  —  t  cos(8  -  a))  in  the  new  Hough  space. 
Combining  these  two  cases,  we  can  obtain  H(8,  p)  from  H,(8,  p)  as 

.  .  s  =  (  H^e'  P  +  lC0S(e  ~  «)) 

^  \  H,(8  +  it.  -p  —  tcos(8  -  a)) 

These  results  show  that  a  translation  of  the  object  causes  shifts  in  the  Hough 
transform  in  the  vertical  ( p)  direction  only.  The  amount  of  the  shift  is  a  function  of 
8  for  each  object  point,  i.e..  it  varies  along  the  horizontal  8  axis  in  the  Hough  space. 
For  each  column  with  a  positive  shift,  there  is  a  corresponding  column  a  circular 
distance  i r  away  in  8  that  requires  an  equal  negative  shift.  This  occurs  because 
tcos(8  -  a  +  it)  =  -rcos(0  -  a).  Thus  half  of  the  columns  in  H,(8,  p)  will  have 
positive  shifts  and  half  of  them  will  have  corresponding  negative  shifts  when  we 
produce  H(6.  p)  from  H,(8.  p).  Those  elements  that  are  shifted  out  of  the  Hough 
space  as  a  result  of  the  negative  shifts  reenter  the  Hough  space  a  circular  distance  n 
away,  we  explained  in  Case  2. 

3.5.  Combined  Scale.  Rotation,  and  Translation  Transformation 
Equations  (4).  (5).  and  (7)  can  be  combined  to  yield 

if  p  +  t  cos(  8  -  a  )  >0 

(8) 

if  p  +  t  cos(  8  -  «)  <0. 

This  relates  H\8.  p)  for  a  general  distortion  to  H(8.  p).  In  Eq.  (8).  it  is  understood 
that  the  additions  to  8  are  performed  modulo  2tt. 


H(8.  p)  = 


H’\  0  +  <j>. 


p  +  t  cos( 8  -  a) 


H’  8  +  <t>  +  - 

i 


-  p  -  l  COs(  8  -  a) 


if  p  +  t  cos(  8  -  a)  >0 
if  p  +  tcos(8  -  a)  <  0. 


3.6.  Digital  Implementation  of  Distortion  Transformations 

A  digital  implementation  of  the  distortion  transformations  is  particularly  simple. 
Assume  that  H\8.  p)  is  stored  as  a  2-D  array  and  that  the  translation  of  the  object 
is  known.  To  undo  the  distortion  in  H\8.  p)  caused  by  translation,  we  need  merely 
shift  the  columns  corresponding  to  different  8  by  an  amount  tcos(0  -  a).  Since  t 
and  a  are  known,  the  amount  of  shift  for  each  8  can  be  precomputed.  If  we  feed 
each  element  in  the  top  row  of  the  new  H'(8.  p)  to  the  element  in  the  same  row  a 
distance  8  =  it  away  horizontally,  then  as  the  elements  of  H'(8.  p)  are  shifted  out 
from  the  top  row  in  one  column,  they  enter  the  proper  column  a  distance  8  =  n 
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away,  causing  downward  shifts  in  these  columns.  This  follows  from  the  earlier 
discussion  of  Eq.  (7).  This  is  easily  achieved  by  up/down  shift  register  type 
memories. 

Having  corrected  the  effects  of  translation  as  above,  the  H'{8.  p)  distortion 
effects  due  to  rotation  <J>  are  similarly  corrected  by  circularly  shifting  the  rows  of 
H'(8 .  p)  by  <J>  in  the  6  direction. 

To  produce  H{8 ,  p)  from  Hs(8,  p)  for  a  scaled  input  and  a  given  ,v,  we  consider 
two  cases  (depending  on  whether  s  >  1  or  .?  <  1 ).  We  assume  that  p  and  r  or  1  /.r 
are  integers.  (The  implementation  is  a  little  more  involved  if  s  is  not  an  integer  and 
will  not  be  discussed  in  this  paper). 

Case  1 .  .r  >  1  (compressed  image). 

Assume  that  s  is  an  integer.  H/6,  p/s )  is  defined  only  for  those  values  of  p  for 
which  p/s  is  an  integer.  Thus,  using  (4)  we  produce  H(8,  p)  from  H/6.  p)  for  p 
such  that  p/s  is  an  integer.  The  remaining  rows  in  H(8.  p)  are  assigned  zero 
values.  Thus,  we  produce  H(Q .  p)  from  H/8.  p)  by  (4)  for  rows  p  where  p/s  is  an 
integer  and  by  inserting  zero-valued  rows  in  the  appropriate  rows  p  of  the  array 
where  p/s  is  not  an  integer.  This  operation  is  also  easily  achieved  in  advanced 
memory  arrays. 

Case  2.  s  <  1  (expanded  image). 

Here  we  replace  .?  by  1/s  (an  integer).  From  (4).  for  the  case  of  a  scale  change,  we 
require  H/8.sp)  =  H(8.  p)  and  H/8.s(p  +  1))  =  H(8.  p  +  1)  for  all  p.  Con¬ 
sider  row  r  in  H  (8.  p)  such  that  sp  <  r  <  s(  p  +  1).  Since  r  is  not  exactly  divisible 
by  .v.  no  row  in  H(8,  p)  exactly  corresponds  to  this  row  in  H/6.  p ).  Therefore,  we 
add  the  votes  for  this  row  to  the  nearest  discretized  value  of  r/s  (either  p  or  p  +  1). 
Thus,  to  obtain  H(8.  p)  from  H/8.  p)  for  a  given  s.  we  need  merely  shift  the  data 
in  all  rows  r  in  H/8.  p)  (for  which  r/s  is  not  an  integer)  and  add  these  data  to  the 
data  in  the  closest  rows  that  are  divisible  by  s.  These  scale  distortion  transformation 
can  also  be  easily  implemented  using  shift  and  add  memory  techniques. 

4  HIERARCHICAL  MATCHING 

In  the  previous  section,  we  described  a  method  of  efficiently  producing  the  Hough 
transform  of  the  image  for  a  given  scale,  rotation,  and  translation.  The  method 
assumes  that  the  scale,  rotation  and  translation  parameters  are  known.  In  practice, 
we  are  given  a  reference  image  and  are  required  to  estimate  these  parameters  for  an 
input  image.  In  this  section,  we  address  simple  techniques  to  estimate  these 
parameters. 

4. 1.  Brute  Force  Method 

One  method  uses  brute  force.  In  this  method,  we  consider  all  probable  combina¬ 
tions  of  these  distortion  parameters  and  for  each  of  these  allowable  combinations, 
we  construct  the  associated  Hough  transform  from  the  observed  Hough  transform 
of  the  input  image.  The  combination  of  distortion  parameters  that  give  an  H{8,  p) 
that  best  matches  that  of  the  reference(s)  yields  the  distortion  estimates  and  the 
object  class  estimate.  If  the  number  of  possible  combinations  of  distortion  parame¬ 
ters  is  large,  the  brute  force  method  will  be  slow  and  inefficient. 
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4.2.  Reduced  Distortion  Parameter  Search 

We  thus  consider  methods  and  cases  when  the  number  of  possible  distortion 
parameter  combinations  can  be  reduced.  This  is  very  application-specific  of  course. 
If  the  target  range  data  (or  a  range  image)  is  available,  the  value  of  scale  can  be 
estimated  quite  accurately.  If  the  application  concerns  top-down  views  of  objects 
such  as  aircraft,  then  the  orientation  and  location  in  H(8.  p)  of  the  two  parallel 
lines  that  define  the  fuselage  of  the  aircraft  provide  a  good  estimate  of  the  object's 
rotation.  Additional  object  distortion  information  is  easily  obtained  from  simple 
operations  on  the  image.  For  example,  the  translational  location  of  the  object  can  be 
determined  from  the  projections  of  the  image  along  the  x  and  v  axes  or  from  the 
first  order  moments  mm  and  mw. 


4.J.  Hierarchical  Search  and  Classification  Method 

We  now  detail  a  simple  three-level  hierarchical  matching-search  procedure  that 
we  have  found  to  work  well  when  the  scale  of  the  object  is  known  and  when  the 
object  is  approximately  centered  (using  moments  or  projections).  Figure  5  shows 
this  method  in  block  diagram  form.  We  describe  this  processor  with  the  distortion 
transforms  (Sect.  3)  applied  to  the  reference  patterns.  In  the  first  level,  the 
translation  is  ignored  and  the  Hough  transform  of  the  input  object  is  matched  with 
all  allowed  rotated  versions  of  the  Hough  transform  of  each  reference  object.  This 
search  is  performed  for  rotations  <j>  quantized  in  A <j>  intervals  to  the  degree  desired 
and  required  for  the  given  object  classes  and  application.  This  can  be  easily 
achieved  by  feeding  the  Hough  transforms  of  the  input  and  reference  images  to  a 
1-D  correlator  as  shown  in  Fig.  5.  This  is  because  a  rotation  of  the  object  gives  rise 
to  a  corresponding  1-D  shift  along  the  <j>  axis  in  the  Hough  domain  (Sect.  3.3).  The 
rotation  angle  <#>,  corresponding  to  the  best  match  and  its  two  nearest  neighbors  <#>; 
and  <f>3  are  retained  as  the  three  most  probable  <t>  values.  From  the  centering 
accuracy  possible,  the  maximum  value  of  t.  rmax,  is  known.  In  the  second  level,  a 
value  for  t  is  assumed.  (We  use  fmax/ 2).  We  must  still  search  the  distortion 
transforms  of  the  reference  objects(s)  for  all  expected  a  values  for  each  of  the  three 
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Fig.  5.  Block  diagram  of  the  HT  hierarchical  search  and  classification  method 
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<p  estimates  obtained  from  the  previous  level.  This  can  be  easily  achieved  b\ 
applying  only  the  a  distortion  transforms  to  the  Hough  transforms  of  the  reference 
objects  (Sect.  3.6).  A  different  a  value  results  in  a  new  HT  that  is  not  simply  a  1-D 
shifted  version  of  the  original  HT.  Thus,  this  matching  in  the  Hough  space  can  he 
done  by  multiplying  the  corresponding  elements  of  the  Hough  transforms  and 
adding  the  products.  This  amounts  to  evaluating  the  correlation  value  at  the  center 
point.  In  Fig.  5,  these  correlations  evaluated  at  one  point  are  referred  to  as 
projections.  The  Aa  quantization  used  is  determined  by  the  object  classes  involved 
and  the  accuracy  required  in  the  given  application.  The  <>  value  and  three  «  values 
corresponding  to  the  best  match  («,)  and  its  two  nearest  neighbors  <  and  a,  I  are 
passed  to  level  3.  In  level  3.  we  search  t  from  0  to  rmav  for  the  three  n  values  and  the 
one  best  <f>  value  determined  from  level  2.  The  HT  for  a  new  /  value  is  again  a  new 
HT  and  this  search  in  It  increments  is  performed  as  the  «  search  in  level  2  was. 
The  number  of  t  values  and  the  range  of  t  to  be  searched  are  set  bv  the  expected 
accuracy  of  the  centering  method  used.  The  best  match  yields  the  final  t.  «.  9.  and 
object  class  estimates.  This  concept  can  be  extended  to  include  a  scale  search  as 
well,  with  an  associated  increase  in  complexity.  Section  5  details  and  quantities  this 
hierarchical  procedure  for  different  aircraft  image  classification  problems  with 
attention  to  the  quantizations  A<j>  and  Aa  and  the  number  of  searches  needed. 

5.  DATABASE  AND  INITIAL  TEST  RESULTS 


5.1.  Database 

The  images  used  in  our  initial  experiments  were  top-down  edge  (boundary) 
images  of  five  different  types  of  aircraft  with  a  resolution  of  128  x  128  pixels. 
Figure  6  shows  the  <j>  =  0  edge  images  of  the  live  aircraft  types  used.  Using 
specialized  software  and  aircraft  model  descriptions,  various  translated  versions  of 
each  image  with  t  varied  from  0  to  60  pixels  within  a  256  x  256  pixel  image  frame 
were  used  together  with  different  rotated  and  scaled  versions  of  each  image  with  the 
scale  s  =  1.  2.  and  3.  For  test  inputs,  t  was  varied  continuously  from  0  to  60. 
whereas  our  t  quantization  used  in  the  system  was  10  pixels.  Thus  our  t  estimates 
are  expected  to  be  accurate  only  to  ±5  pixels.  The  a  translation  parameters  used 
ranged  from  0  to  315°  and  were  quantized  to  Aa  =  45°.  The  rotation  parameters  <5 


Fro.  6  Edge  images  of  the  aircraft  types  used,  (a)  DCH),  (b)  B57.  (c)  F105.  (d)  Mirage,  (e)  Mig 
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used  varied  from  0°  to  330°  in  increments  A<j>  =  30°.  Thus,  there  are  12  values  of  4>. 
3  values  of  s,  6  nonzero  values  of  /.  and  8  values  of  a  for  each  nonzero  value  of  i. 
This  makes  a  total  of  (1  +  6  x  8)12  x  3  =  1764  possible  combinations  of  the 
distortion  parameters. 

The  H(8,  p)  transform  space  was  computed  as  in  (1)  with  A 9  =  5°.  A p  =  5  and 
the  origin  in  the  center  of  the  256  X  256  image  frame.  The  Hough  array  ( 0.  p )  is 
thus  of  size  360/5  X  128/5  or  72  x  26.  Byte  arrays  were  used  to  store  H{8.p)  to 
256  levels  from  0  to  255.  The  largest  pixel  value  in  all  H(8.  p)  arrays  was 
normalized  to  255  and  values  below  a  threshold  were  simply  set  to  zero  to  reduce 
the  computations  in  the  matching  process.  A  threshold  of  40  was  used  for  noise-free 
images.  The  hierarchical  search  test  results  involving  scale  changes  have  not  been 
included  in  this  paper.  However,  our  experiments  indicate  that  in  order  to  achieve 
good  results  with  scaled  images,  we  need  to  compute  the  Hough  transform  with 
slightly  better  resolution.  Ad  =  2°  and  Ap  =  2. 

5.2.  Representative  H(9.p)  Examples 

In  Fig.  7a  we  show  H(6.  p)  for  a  Mirage  oriented  at  <#>  =  0  and  centered  at  the 
origin  and  in  Fig.  7b  we  show  H(8.  p)  for  the  Mirage  shifted  upwards  by  60  pixels. 
We  discuss  Figs.  7a  and  b  to  provide  insight  on  the  contents  and  pattern  in  the 
Hough  transform.  The  peaks  in  each  H(8.  p)  can  be  associated  with  the  various 
lines  in  the  image.  In  Fig.  7a.  the  bright  peaks  at  approximately  6  =  270°  ±  30° 
correspond  to  the  two  lines  thai  define  the  front  edge  of  the  wings,  the  peaks  near 
8  =  90°  correspond  to  the  back  edges  of  the  wings  and  the  edges  of  the  tail.  The  two 
parallel  vertical  lines  that  define  the  fuselage  produce  peaks  at  p  =  0  and  8  =  0  and 
180“.  (Recall  that  p  was  discretized  to  integer  multiples  of  5.)  In  the  Hough 
transform  in  Fig.  7b  of  the  Mirage  translated  vertically  upward  by  60  pixels, 
the  columns  of  the  Hough  array  are  shifted  up  or  down  by  t  cos( 8  -  a)  = 
60 cos (6  -  90°)  =  60 sin 8.  as  in  Eq.  (6).  The  shifts  from  8  =  0  to  180°  are  positive 
downward  with  the  maximum  shift  occurring  at  8  =  90°.  The  shifts  for  8  between  t 
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and  2tt  are  negative  and  the  original  data  there  merges  in  the  top  portion  of  the 
array  ami  enters  1X0°  away  between  0  -  0  and  v.  As  seen,  this  causes  the  peaks  due 
to  the  front  edges  of  the  wings  to  now  occur  at  90°  ±  30°  (at  smaller  p  values) 
instead  of  at  smaller  p  values  at  270°  ±  30°  as  in  Fig.  7a. 

To  determine  the  distortion  of  this  one  known  class  of  input  test  object  from  Fig. 
7b.  vve  could  produce  H(8.  p)  for  all  1764  possible  sets  of  the  distortion  parameters 
(.v.  r.  «,  and  <p)  applied  to  the  Hough  space.  For  each  case,  the  new  H'(8.  p)  could 
be  template  matched  against  the  H(0.  p)  reference  in  Fig.  7a.  The  distortion 
parameters  associated  with  the  largest  correlation  value  obtained  are  selected  as  the 
best  estimate.  Figure  7c  shows  H\8  p)  for  the  best  match.  As  can  be  seen,  it  is 
visually  very  similar  to  the  original  H{6.  p)  in  Fig.  7a.  The  correlation  value 
C,  =  1.51  x  10*'  for  the  correct  (?.«.<>)  =  (60.90°.0)  choice  was  the  largest  one 
obtained.  The  next  largest  value  C:  =  1.04  X  10* occurred  for  ( r.  «.  9)  =  (50. 90°.  0). 
The  maximum  C,  value  compares  favorably  with  the  autocorrelation  C4  =  2.3X  x 
106of  the  original  H(8,  p).  Thus,  local  maxima  can  be  avoided  and  high  confidence 
in  the  final  estimate  can  be  obtained  by  ensuring  that  Cj  is  some  high  fraction  of 
(typically  =  0.6  C,). 

Figure  8  shows  similar  one-class  test  results  for  the  Mirage  with  <>  =  0°  (Fig.  8a) 
and  <f>  =  120°  (Fig.  8b)  rotation  only.  The  H\9.  p)  pattern  with  the  best  match  is 
shown  in  Fig.  8c  with  its  C,  value  and  the  associated  (/.  a.  <>)  parameters.  The  t\ 
value  for  the  next  best  match  is  listed  for  completeness.  Again,  the  correct  object 
distortion  estimates  are  obtained.  The  variations  in  the  C\  values  arise  due  to  the 
quantization  of  the  Hough  space.  Visual  inspection  of  Figs.  8a  and  h  shows  that 
they  are  the  same  with  Fig.  8b  being  a  cyclically  shifted  version  of  Fig.  8a  (with  a 
cyclic  shift  of  120°  or  120°/360°  =  ‘  of  the  H{6.  p)  pattern.  As  can  be  seen.  Fig. 
8c  is  almost  identical  to  Fig.  8a. 

5.3.  Multiple- Distortion  Intra-Class  Recognition  Tests 

This  H(8.  p)  transformation  and  template  matching  technique  was  then  applied 
to  multi-class  multiple-distorted  versions  of  the  five  aircraft  types.  Columns  1  -4  in 
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TABLF  I 

Selected  Intra-Class  Multiple  Distortion  Test  Results 


Best  estimates  from  Best  estimates  from 

full  search  hierarchical  search 

A<>  «  30°.  Af  =  10.  An  =  45  c  A<>  =  30°.  A t  =  10.  A«  -  45° 


Tost 

Aircraft 

Translation 

Rotation 

Translation 

Rotation 

Translation 

Rotation 

no. 

name 

a,  b  (pixels) 

$  (degrees) 

a.  h  (pixels) 

< P  (degrees) 

a.  h  (pixels) 

&  (degrees) 

1 

Mirage 

0.60 

0 

0. 60 

0 

0.60 

0 

■> 

Mirage 

-  30. 30 

0 

-  2X.  2X 

0 

28.  -28 

30 

3 

Mirage 

14. 14 

120 

14. 14 

120 

14.  14 

120 

4 

DC  10 

7.7 

270 

7.7 

270 

7.7 

270 

5 

DC  10 

25.  25 

270 

28.  28 

270 

14.  14 

1 50* 

6 

DC  10 

30.  35 

270 

35.  35 

270 

35.  35 

1 50* 

7 

B57 

5.  5 

320 

7.  7 

330 

7.  7 

330 

8 

B57 

17.17 

320 

21.21 

330 

21.21 

330 

0 

B57 

5X.  5 

320 

60.0 

330 

50.0 

30* 

10 

Mig 

X.  x 

■>  i  ^ 

7.  ? 

210 

7.  ? 

210 

11 

Mig 

14.  14 

225 

14.  14 

210 

14.  14 

21o 

12 

Mig 

45.41 

225 

42.42 

210 

21.21 

2n0* 

15 

F105 

^.4 

330 

14.  14 

3  3)  i 

14.  14 

330 

14 

F105 

20.  ■  20 

330 

21.  21 

33o 

14.  14 

210* 

15 

IT  05 

60.  5 

330 

60.0 

33o 

60.0 

300 

•Large  tit  >  25) 


Table  1  describe  the  input  test  data.  Data  for  three  representative  distorted  versions 
of  each  aircraft  type  are  included.  These  initial  one  class  (intra-class)  results  assume 
that  the  object  class  was  known  and  thus  only  represent  tests  of  distortion 
parameter  (  v.  <).  h)  estimates.  The  results  for  both  a  full  (brute  force)  search  and 
ou:  hierarchical  search  are  included.  The  full  search  method  results  (columns  5  and 
6  of  Table  I )  always  yield  the  correct  estimates  within  the  quantizations  A<f>  =  1(1°. 
A/  =  10.  and  Aa  =  45°  of  our  distortion  parameters.  The  estimates  for  translation 
are  given  in  terms  of  the  a  and  h  parameters  which  can  be  easily  obtained  from  the 
i  and  a  parameters. 

The  results  using  our  hierarchical  search  method  are  now  discussed.  Note  that  the 
test  inputs  are  only  approximately  centered  in  these  tests.  The  intra-class  test  results 
on  the  same  15  test  images  using  our  hierarchical  search  method  are  presented  in 
columns  7  and  9  of  Table  1.  The  scale  v  is  assumed  to  be  known.  In  the  first  level. 
12  tests  of  <f>  are  made  ( l<f>  =  30°)  assuming  that  the  translation  is  zero  (i.e.. 
a  =  h  =  0)  and  the  three  best  values  are  passed  to  the  second  level.  In  the  second 
level.  8  values  of  «  are  tested  for  the  three  best  <>  values  from  level  1  (i.e.. 
8  X  3  =  24  tests  are  performed).  These  level  2  tests  are  performed  for  a  lived 
i  -  ( a1  +  b1 )'  :  =  20.  Since  the  object  is  assumed  to  be  approximately  centered. 
i  =  20  is  a  reasonable  estimate  for  translation.  The  three  best  n  values  and  the  best 
<f>  value  are  then  passed  to  level  3.  w  here  six  /  tests  for  each  «  are  made  (3x6=18 
tests).  The  total  number  of  test  matchings  required  is  thus  1  2  +  24  +  18  or  onh  54. 
This  is  a  significant  reduction  from  the  1764  tests  required  in  the  brute  force 
method.  As  can  be  seen  from  the  results,  this  method  gave  comparable  results. 
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except  for  large  t  (t  >  25)  denoted  by  *  in  Table  1.  This  is  expected  because  the 
simple  method  (assuming  t  -  0)  used  to  estimate  the  </>  value  in  the  first  level  failed. 
By  centering  the  object  in  advance  or  by  including  several  t  values  in  level  1.  near 
perfect  performance  can  be  obtained. 

5.4.  Discrimination  and  Multiple-Distortion  Performance 

Table  2  shows  test  results  of  the  discrimination  and  recognition  performance  of 
our  hierarchical  method  in  a  multi-class  case.  Columns  2  4  list  the  selected  input 
test  image  information.  The  tests  included  four  of  the  input  aircraft  types  with 
different  multiple  translation  (a.  b)  and  rotation  (<j>)  distortions  present  and  one 
(test  5)  with  only  a  shift.  The  best  template  match  for  each  test  input  with  two  to 
four  of  the  reference  aircraft  types  is  given  (columns  5-8).  In  tests  1-3.  we  see  that 
both  the  correct  aircraft  class  and  the  correct  distortion  parameters  are  obtained. 
Such  an  excellent  performance  is  expected  when  t  <  25.  Thus  the  multi-class 
discrimination  and  intra-class  recognition  (multiple  distortion  invariance)  features 
of  this  processor  have  been  demonstrated.  From  Fig.  6.  we  see  that  the  FI 05  and 
Mig  images  are  rather  similar.  We  thus  expect  discrimination  between  these  two 
aircraft  types  to  be  difficult.  In  test  4.  we  find  that  the  Mig  input  would  be 
misclassified  as  an  F105.  Using  a  Flough  array  with  higher  spatial  resolution  could 
resolve  these  two  similar  classes.  If  we  use  the  fact  that  C,  =  0.83  x  lO6  occurs  for 
the  Mig  and  that  a  larger  C4  =  1.56  x  10*  occurs  for  the  F105.  we  can  normalize 
the  data  or  set  C,  =  0.6  x  and  realize  that  the  observed  C\  is  too  large  and  thus 


TABLE  : 

Multi-Class  Multiple  Distortion  Recognition  am)  Performance  of  Hierarchical  Hough  Transform 
Transformations  ansi  Matching 


Test 

no. 

1 


3 


4 

5 


Input  test  aircraft  information 


Aircraft 

Translation 

Rotation 

Referene 

type 

u.  h  (pixels) 

<t>  (degrees) 

aircraft 

DOO 

7.7 

270 

DC  10 

FI  05 

B57 

Mirage 

B57 

7.  7 

30 

B57 

DC  10 
FI  05 

F105 

-  7.7 

330 

FI  05 
DC  10 
B57 

Mig 

Mig 

-  14.  14 

150 

FI  05 
Mig 

Mirage 

0,60 

0 

Mirage 

DC10 

Hierarchical  processor  results 
Best  estimates 


ii 

ii 

-o- 

10.  4o  =  45' 

Correlation 

a.  h 

6 

t  ulue 

7  7 

270 

1  77  -  HI' 

7t  7 

270 

1.27  x  10* 

20.0 

240 

1.0.3  X  to* 

0.0 

270 

161  X  10* 

-7.-7 

30 

1  53  x  10* 

14.  14 

0 

1.14  x  10* 

14.  14. 

60 

1.0(1  x  10* 

7,7 

330 

1.45  x  10* 

0. 10 

330 

1.30  X  10* 

7,7 

330 

1  11  x  10* 

-  10.0 

330 

1.00  X  10* 

-21.  -21 

1 50* 

1.05  x  10* 

14.  14 

150 

0.79  X  10* 

0.60 

0 

1.51  x  10* 

0, 60 

0 

1  16  x  10* 

Misclassified. 
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a  b  c 

Fig  sF  Image  of  the  Mirage  w i th  noise  v. i th  ia)  n„  =  0  2.  (b)  nfl  =  0  25.  and  u>  oy;  -  0.3 


provide  discrimination  of  such  similar  object  classes.  From  Fig.  6.  we  also  note  that 
the  Mirage  and  DC10  are  similar  in  shape  as  well  as  in  size.  Test  5  was  included  to 
show  that  our  Hough  transform  hierarchical  technique  still  allows  us  to  discriminate 
between  them.  All  the  test  results  were  the  same  when  the  brute  force  method  was 
used.  i.e..  identical  values  for  the  best  (  ,  and  (\  values  were  obtained. 

5.  5.  \  oi.se  Performance 

To  determine  and  quantify  the  performance  of  these  methods  in  the  presence  of 
noise,  noisy  input  images  were  generated  as  follows.  Random  noise  w  ith  a  Gaussian 
distribution  and  of  zero  mean  and  different  variance  o„  was  added  to  each  pixel  in 
the  test  image.  The  resulting  image  was  then  rebinari/ed  by  thresholding  it  at  0.5. 

Figure  0  shows  the  image  of  the  Mirage  when  noise  with  o,;  -  0.2.  nrl  =  0.25.  and 
o„  -■  0.3  was  added.  Table  3  shows  the  performance  of  our  full  search  and  hierarchi¬ 
cal  methods  for  intra-class  multiple  distortion  estimation  with  a  noise  variance 
o„  =  0.2.  As  seen,  all  results  are  perfect  in  the  case  of  the  brute  force  method  (within 
our  quantization).  The  results  in  the  case  of  the  hierarchical  method  are  correct 
except  in  the  case  of  test  3.  When  an  was  increased  to  (1.3.  the  brute  force  method 
still  gave  the  same  results,  but  the  hierarchical  method  was  in  error  in  30  50 T  of  the 
cases  with  the  <>  estimate  in  level  1  generally  being  the  estimation  parameter  in 
error. 
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TAB I  I-  4 

Multi-cla.w  Multiple  Distortion  Recognition  and  Performance  of  Hierarchical  Hough  lia'isfnmi 
Transformations  and  Matching  When  Noise  *ilh  n„  0.25  Was  Added 
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When  discrimination  performance  with  multiple  distortions  for  o„  =  0.2  was 
tested  the  results  obtained  were  essentially  the  same  as  those  for  the  noise-free  eases 
in  Table  2.  However,  when  o„  w  as  increased  to  0.25  (Table  4).  the  method  is  found 
to  make  an  additional  error,  with  the  FI 05  heirg  wrongly  classified  as  a  DOO  (test 
3.  Table  4).  A  threshold  of  HO  in  the  Hough  space  was  used  for  these  noise  tests. 

The  signal  to  noise  ratio  (SNR)  in  these  tests  can  be  computed  as 

SNR  =  — '  (4) 

\i  +  -V 

where  ,Yb  is  the  number  of  boundary  pixels  on  the  noise-free  target.  is  the 
number  of  background  pixels  added  and  Y(1,  is  the  number  of  target  pixels 
removed,  for  <t„  =  0.2(0.25)  and  the  Mirage  aircraft.  SNR  1.1"(0.316).  Thus,  our 
observed  performance  is  excellent  in  the  ease  of  poor  input  SNR. 

t  SI  MM  XRX  AND  (  ON<  I  I'SIONT 

A  new  approach  using  the  basic  Hough  transform  defined  for  straight  lines  has 
been  suggested  for  estimating  the  scale,  translation  and  rotation  distortion  parame¬ 
ters  of  an  input  test  object.  The  method  is  capable  of  multi-class  object  discrimina¬ 
tion  and  multiple-distortion  object  recognition,  l  est  results  on  aircraft  imagery  were 
prov  ided  and  shown  to  be  excellent  for  multi-class  discrimination,  distortion  param¬ 
eter  estimation  and  in  the  presence  of  noise.  The  new  direct  use  of  the  Hough  space 
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is  possible  hv  use  of  the  new  and  efficient  Hough  transform  distortion  transforma¬ 
tions  developed  A  new  hierarchical  search  method  was  devised  that  allows  efficient 
realization  of  the  proposed  concept.  This  technique  also  allows  the  Hough  space  to 
be  spatialh  quantized,  thereby  further  simplifying  realization.  If  the  translation  of 
the  obiect  is  large,  the  use  of  moments  (or  similar  methods)  to  center  the  object, 
combined  with  a  1-D  correlation  and  followed  by  matching  with  a  few  distortion- 
transformed  images  provides  the  class,  scale,  rotation  and  translation  estimates,  f  or 
the  accurate  estimation  of  scale,  a  higher  spatial  resolution  in  the  Hough  space  is 
required. 
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Optical  iconic  filters  for  large  class  recognition 


David  Casasent  and  Abhijit  Mahalamobis 


Approaches  are  advanced  for  pattern  recognition  when  a  large  number  of  classes  must  be  identified 
Multilevel  encoded  multiple-iconic  filters  are  considered  for  this  problem.  Hierarchical  arrangements  o 
iconic  filters  and/or  preprocessing  stages  are  described.  A  theoretical  hasis  for  the  sidelobe  level  and  noisi 
effects  of  filters  designed  for  large  class  problems  is  advanced.  Experimental  data  are  provided  for  an  optica 
character  recognition  case  study. 


(.  Introduction 

Advanced  artificial  intelligence,  symbolic,  and  other 
processors  required  to  operate  on  large  knowledge 
bases'  -  need  techniques  to  handle  a  large  number  of 
object  classes.  We  consider  pattern  recognition  appli¬ 
cations  when  the  number  of  object  classes  to  be  identi¬ 
fied  is  large.  Our  approach  can  be  applied  to  logic 
processors  (in  which  the  input  is  a  query)  and  to  sym¬ 
bolic  and  associative'  processors.  However,  pattern 
recognition  offers  a  more  easily  defined  problem,  and 
thus  we  pursue  this  specific  application.  We  employ 
an  optical  character  recognition  (OCR)  case  study  ex¬ 
ample  to  quantify  and  demonstrate  remarks  and  re¬ 
sults,  since  such  a  data  base  is  easily  available.  Much 
recent  pattern  recognition  research  has  addressed  al¬ 
gorithms  to  achieve  distortion  -invariance,  i.e.,  recogni¬ 
tion  of  geometrically  distorted  versions  of  an  object.4  6 
In  this  paper  we  consider  large  class  problems  in  which 
the  number  of  different  objects  is  large.  Incorporation 
of  distortion-invariant  techniques  into  the  filters  we 
discuss  can  further  broaden  their  use.  Since  the  filters 
we  discuss  operate  on  input  image  pixel  representa¬ 
tions,  we  refer  to  them  as  iconic  filters.7 

Section  II  describes  our  OCR  data  base,  and  Sec.  Ill 
reviews  several  basic  iconic  filter  synthesis  algorithms. 
In  Sec.  IV  we  advance  a  theoretical  analysis  of  the 
effect  of  the  number  of  training  images  and  object 
classes  on  the  output  sidelobe  level  and  the  noise  sensi¬ 
tivity  of  iconic  filters.  Section  V  describes  several 
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systems  to  achieve  large  class  recognition  without  tht 
iconic  filter  problems  associated  with  large  training 
sets  of  data.  Experimental  data  are  then  provided 
(Sec.  VI)  to  quantify  and  demonstrate  all  major  points 
advanced. 

II.  Data  Base  and  Case  Study 

As  an  easily  obtainable  data  base  we  selected  recog 
nition  of  the  62  characters  (26  lower-case  and  26  upper¬ 
case  letters,  plus  the  10  number  digits)  in  a  variety  ol 
fonts.  WTe  obtain  80  X  80  pixel  images  of  the  62  char¬ 
acters  from  15  different  magazines:  Time,  Scientific 
American  (Scienam),  Datamation  (Datama),  Busi 
ness  Week  (Busweek),  etc.  We  will  refer  to  the  fifteer 
versions  of  each  character  as  fonts  (although  they  rep 
resent  different  point  sizes  of  each  character  as  well) 
In  our  experiments,  we  will  view  these  as  in-class  varia 
tions.  Font  identification  can  be  achieved  by  othe 
methods.''  Our  filters  are  thus  designed  to  be  able  t< 
provide  the  recognition  of  each  character  independent 
of  the  input  font,  but  without  the  requirement  to  iden 
tify  the  input  data  font.  This  choice  also  allows  us  tes 
data  that  are  not  present  in  the  training  set  used  ti 
synthesize  the  filters.  Figure  1  shows  several  charac 
ters  from  three  of  the  magazines  to  demonstrate  th 
similarity  and  differences  in  the  fonts  present  in  ou 
data  base. 

III.  Iconic  Filter  Synthesis 

The  basic  filters  considered  are  extensions  of  on 
type1""  of  distortion-invariant  matched  spatial  filter 
with  attention  to  our  present  application.  For  coir 
pleteness  we  review  three  types  of  these  filters  an 
three  classes  of  filters  possible.  This  section  also  a 
lows  the  terminology  to  be  defined. 

We  denote  objects  in  one  class  by  \fn\  and  objects  in 
second  class  by  |g„|.  The  members  within  each  clas 
are  generally  different  3-D  geometrically  distorte 
versions  (e.g.,  aspect  views)  of  each  object.  In  ot 
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Fig.  1.  Typical  characters  from  three  different  publications:  la) 
The  Xetv  York  Times:  (b)  Datamation:  and  (e)  Scientific  American. 

OCR  application  the  members  within  each  class  will  be 
different  font  representations  of  each  input  character/ 
object.  We  denote  vector  versions  (e.g.,  lexicographi¬ 
cally  ordered  images)  of  the  objects  by  f„  and  g„  and 
the  filters  designed  by  h/,-  (all  are  2-D  images,  cr  vec¬ 
tors).  When  f„  and  g„  are  similar  (such  a  filter  to 
recognize  one  class  must  also  have  information  on  the 
other  class),  we  specify  a  filter  ii  so  that 

(f.  •  hi  =  I.  (g.,  •  h>  =  0  ill 

for  all  n,  where  <  )  denotes  the  vector  inner  product 
operation  f7Ti.  We  restrict  all  filters  to  be  linear  com¬ 
binations  of  all  training  set  images 

V,  .  \.  •  \ 

h{x.\)  =  A  n,,/1:(.v..v)  +  \  i.v.vl.  up 

~  ii  =“  •  ] 

For  N i  images  in  [f„\  and  N-<  images  in  jg„|,  the  jV_.  +  AC 
coefficients  an  define  the  filter  function.  The  coeffi¬ 
cient  vector  a  and  hence  the  filter  function  h  are  the 
solution  of  Va  =  u,  where  V  is  the  vector  inner  product 
matrix  of  the  data  set,  and  u  =  Ui  =  [1 .  .  .  1,  0. .  .]v  is 
set  by  Eq.  (1)  to  yield  1  outputs  for  all  images  in 
class  one  and  0  outputs  for  all  AF  images  in  class  two. 
The  filter  is  thus  specified  by 

a  =  V  'u,.  i.o 

T o  recognize  |g„  j  and  reject  (f El .  the  control  vector  u  i  in 
Eq.  (3)  is  simply  changed  to  [0.  .  .  0, 1 .  .  .  1] ',  and  a  new 
a  set  of  weights  is  determined. 

A  multilevel  filter  with  outputs  equal  to  one  for  class 
one  objects  and  two  for  class  two  objects  can  easily  be 
fabricated  using  the  control  vector  ui  =  [1.  .  .1,2.  .  .2, 
3.  .  .3] 7  in  Eq.  (3).  As  shown,  extensions  of  this  filter 
to  more  than  two  classes  are  possible.  Binary-encoded 
multiple  filters  can  also  be  employed.  In  this  case  the 
outputs  from  the  filters  define  a  digital  word  (e.g.,  10, 
01,  11,  for  the  case  of  F  =  2  filters)  that  denotes  the 
object  class  (e.g.,  if  the  outputs  from  the  two  filters  are 
both  1,  the  code  word  is  11  and  the  input  test  object  is 
in  class  three).  Synthesis  of  these  filters  uses  the  same 
basic  technique  in  Eq.  (3)  with  different  u  control 
vectors. 


Fig.  2.  Multichannel  frequency  plane  correlator  with  /■  =  1  iconic 
matched  spatial  filters  for  large  class  pattern  recognition.1 


For  large  class  problems  we  propose  the  use  ot  multi¬ 
level  multiple  iconic  filters  (specifically  F filters  with  L 
output  levels).  The  output  from  such  a  system  is  now 
an  F-digit  word  (one  output/filter)  and  is  thus  capable 
of  representing  Lr  different  states  or  object  classes  ( in 
practice  Lh  —  1  states  are  obtained  since  the  all-zero 
slate  can  also  occur  for  no  input  object).  Prior  work  on 
such  filters  has  shown  quite  promising  results.  How¬ 
ever,  attention  has  been  given  to  their  distortion-in¬ 
variance  and  no  more  than  four  object  classes  have 
been  considered  for  use  in  such  filters. 

Three  different  classes  of  such  iconic  filters  can  be 
identified.1"  The  filters  described  above  are  projec¬ 
tion  filters  since  the  formulation  specifies  only  the 
central  or  peak  value  in  the  correlation  of  h  and  the 
input  object.  For  many  object  classes  (especially 
when  the  total  number  of  training  images  Nr  is  small), 
control  of  the  central  peak  value  in  the  correlation 
function  allows  sufficient  performance  and  specially 
low  sidelobe  levels.  We  address  this  issue  in  detail  in 
Sec.  IV7.  For  cases  when  the  sidelobes  for  one  object 
class  are  larger  than  the  peak  values  for  other  classes 
(or  larger  than  the  value  at  the  center  of  the  correlation 
function  for  the  same  object  class),  correlation  filters 
can  be  used.  These  filters1"  use  shifted  versions  (typi¬ 
cally  four)  of  each  training  set  image  to  control  the 
shape  of  true  correlation  peaks  (i.e.,  they  specify  a 
fixed  value  at  the  center  of  the  correlation  function  and 
zero  values  at  ±d,  pixels  away,  horizontally  and  verti¬ 
cally).  These  filters  require  five  times  the  number  of 
training  images  that  are  needed  in  the  projection  filter, 
and  hence  N effects  for  these  filters  will  be  worse. 
The  best  peak  to  sidelobe  ratio  (PSR)  in  the  output 
correlation  pattern  is  obtained  with  a  PSR  iconic  fil¬ 
ter.1"  The  disadvantage  of  this  filter  is  that  its  peak 
value  cannot  be  specified.  Thus  since  multilevel  en¬ 
coding  is  not  possible  with  such  a  filter,  the  number  of 
classes  that  one  can  accommodate  using  multiple  PSR 
filters  is  significantly  reduced. 

These  three  filters  are  typically  used  as  the  filters  in 
a  frequency  plane  correlator.  Figure  2  shows  the  clas¬ 
sic  frequency  plane  correlator  with  four  frequency- 
multiplexed  filters  at  P  and  four  output  correlation 
planes  at  Pi.  These  F  =  4  correlation  planes  are  read 
out  in  parallel  in  raster  format  in  synchronization. 
From  the  F  =  4  digit  output  word  obtained  for  each 
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pixel  location  in  the  output  correlation  planes,  the 
class  or  category  cf  each  region  of  the  input  image  at  P, 
can  be  obtained. 1 1  The  use  of  more  than  four  parallel 
correlation  planes  is  generally  prohibitive,  and  thus 
such  an  architecture  can  accommodate  1/  =  L1  object 
classes.  To  accommodate  large  class  problems,  multi¬ 
level  filters  (L  >  2)  are  thus  essential. 

These  filters  can  also  be  applied  to  associative  mem¬ 
ories  as  detailed  elsewhere.1-  The  classic  system  is 
shown  ir>  Fig.  "  Her"  the  input  i-o  vector  data  x  atP| 
describes  an  input  object,  and  the  F  filters  at  P>  are  the 
columns  of  the  associative  memory  matrix  M.  The  Pi 
output  vector  v  is  the  P-digit  encoding  of  the  input 
object  from  which  one  can  decode  the  object  into  a 
member  of  one  of  Lh  classes. 

IV.  Large  Training  Class  Effects  on  Iconic  Filter 
Performance  (Theory) 

In  numerous  tests  of  the  iconic  filters  described  in 
Sec.  Ill  we  noted  that  the  performance  of  the  projec¬ 
tion  and  correlation  filters  degraded  (i.e.,  large  side- 
lobe  levels  occurred)  as  the  number  of  training  set 
images  Nr  was  increased.  For  our  large  class  problems 
of  present  concern  Nr  will  also  be  large,  and  thus  this 
issue  is  of  significant  concern.  Thus  we  now  address 
this  issue  theoretically  for  the  case  of  correlation  iconic 
filters.  Solution  of  large  matrices  that  arise  in  large 
class  problems  can  be  addressed  by  advanced  tech¬ 
niques  and  is  not  of  immediate  concern  here.  The 
analysis  is  simplified  by  considering  the  Fourier  trans¬ 
form  of  the  correlation  plane.  Specifically,  we  consid¬ 
er  the  average  (or  mean)  n  and  scatter  5  of  the  magni¬ 
tude  of  the  Fourier  transform  of  the  correlation 
function.  The  average  n  value  equals  the  peak  value  in 
the  correlation  plane  (this  follows  from  Parseval's 
theorem) 

lyirl/M/f  =  1 1  MiZFtktH’lk).  Ml 

where /and  h  are  1-D  sequences,  and  Fand  H  are  their 
Fourier  transforms,  and  the  summation  is  over  the 
number  of  pixels  M  in  each  image.  We  thus  write  the 
average  for  an  input  image  //,  and  a  linear  combination 
filter  h  (described  by  coefficients  a„)  as 

u  =  K\H'h',\  =  X  /•.'[(! =  X  u„i  i„  =  ii,  •  I'n 

where  denotes  element  (k,n)  of  the  matrix  V,  and  ui; 
is  element  k  of  the  control  vector  u  in  Eq.  (3).  The 
scatter  S  in  the  Fourier  transform  of  the  correlaton  is  a 
measure  of  the  ripple  or  sidelobes  present  in  the  output 
correlation  plane.  Using  Eq.  (4)  and  the  filter  synthe¬ 
sis  of  Eq.  (3),  the  scatter  is  shown  to  satisfy 

N  =  1 1  / / * /-', |  1  —  (i  ‘  Iz  X  X  r, ,  -  fi 

■S'  >  r, ,  I  a  1  Vr  a  I  —  e  .  (*>l 

We  now  consider  how  S  varies  as  the  number  of 
training  images  Nr  increases.  Since  the  matrix  V  is 
symmetric  and  positive  definite,  we  decompose  it  and 
easily  show 


p|  P2  ;  r 

Fig.  Multiple  iamie  projection  filter  associative  processor  sys¬ 
tem. 

a 1 V  a  ~  X  u A  r, .  1 7 1 

where  X„  are  the  eigenvalues  of  V,  and  a~  are  positive 
constants.  The  term  u**  in  Eq.  (6)  is  positive  (since 
these  diagonal  elements  correspond  to  the  autocorrela¬ 
tion  of  positive  images).  Similarly  \„  >  0  in  Eq.  (7) 
since  V  is  a  positive  definite  matrix.  Although  the 
terms  in  Eq.  (7)  are  positive,  the  values  of  the 
individual  a„  and  change  with  Nr-  Hence  for  in¬ 
creasing  Nr  the  sum  in  Eq.  (7)  [and  hence  the  scatter  in 
Eq.  (6)]  may  increase  or  decrease.  It  can  be  shown 
that 


where  c  is  a  positive  constant.  This  sum  clearly  in¬ 
creases  with  Nr  and  is  an  upper  bound  on  Eq.  (7). 
Thus  the  scatter  S  in  Eq.  (6)  (and  hence  the  correlation 
plane  sidelobes)  increases  as  the  number  of  training 
images  increases.  Extensions  of  this  theoretical  treat¬ 
ment  to  the  various  other  classes  of  iconic  filters  yield 
the  same  trend  for  the  correlation  sidelobes  and  the 
scatter  S  to  increase  with  Nr- 

In  numerous  tests  we  also  observed  (when  more 
training  images  were  used)  that  the  dynamic  range 
requirements  of  the  filter  and  its  noise  requirements 
became  more  severe.  We  now  advance  a  theoretical 
basis  for  this  effect.  We  consider  the  average  mf  and 
the  scatter  Sr  of  the  pixels  in  the  filter  image  (denoted 
by  the  subscript  F).  The  average  and  scatter  now- 
considered  apply  to  the  image  plane  representation  of 
the  filter  function  and  not  the  output  correlation 
plane.  As  Sy  increases,  the  variations  in  the  pixel 
values  in  the  filter  image  itself  increase  and  hence  so 
does  the  number  of  levels  required  in  the  filter  image 
and  also  the  effects  of  noise  (we  will  demonstrate  this 
experimentally  in  Sec.  VI).  The  mean  of  the  filter 
image  is 

»i,  =  /-.jh|  =  /-.  X  n  j  =  X  (p  F.'lfJ  =  X  (i ..r,,...  A/.  (SI 

where  a  linear  combination  filter  is  again  assumed,  and 
where  the  last  equality  in  Eq.  (8)  is  obtained  by  esti¬ 
mating  F[f„|  by  where  M  is  the  number  of  pixels 

in  the  image.  This  approximation  is  realistic  for  our 
binary  images,  where  is  the  dot  product  of  image  f„ 
and  itself.  From  Eq.  (8).  the  mean  of  the  filter  is  thus 
seen  to  be  proportional  to  the  sum  of  the  diagonal  V 
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weighted  by  the  an  linear  combination  filter  coeffi¬ 
cients. 

Proceeding  similarly,  the  scatter  is  found  to  be 

"V  =  f.jh']  -  K'lhl  (9a) 


+  2  V  V  a„o^(r„^  -  e„„e„,„,./A/)] 

n  n 

=*  (l/A/la'V  a.  Oct 

For  cross  products  vnm  we  have  used  a  similar  estima¬ 
tion  for  the  expected  value  E[f„fm]  =  vnm/M.  The 
second  double  summation  in  Eq.  (9b)  does  not  include 
n  =  m.  The  final  relation  in  (9c)  assumes  1  —  vnn/M  = 
1  and  C'w, v::: ...  /A/) 

—  Vnm-  These  approxima¬ 
tions  are  valid  for  our  OCR  character  example,  where 
the  average  auto  projection  value  is  v„„  =  100,  and  the 
average  cross  projection  value  is  vnm  =*  50,  and  the 
number  of  pixels  per  image  is  M  =  6400.  From  Eq.  (7) 
we  see  that  Sf  in  Eq.  (9)  increases  with  the  number  of 
training  images  Nj.  This  increases  the  filter’s  dynam¬ 
ic  range.  As  we  quantify  in  Sec.  VI,  this  makes  the 
effect  of  noise  more  significant  in  filters  synthesized 
from  a  large  number  of  training  images  Nr.  In  Sec.  V 
we  advance  various  ways  to  reduce  Nr  and  yet  achieve 
large  class  recognition. 

V.  Large  Class  Solutions 

In  this  section  we  advance  several  solutions  to  the 
large  class  recognition  problem  with  attention  to  the 
degraded  performance  of  iconic  correlation  filters  ex¬ 
pected  when  a  large  set  of  training  set  images  is  used. 
In  Sec.  VI  we  advance  experimental  verifications  of 
many  of  the  suggested  solutions.  We  note  that  our 
theory  in  Sec.  IV  applies  not  only  to  correlation  filters, 
but  also  to  projection  filters  if  one  does  not  look  only  at 
the  correlation  peak  point.  If  projection  filters  are 
interrogated  at  the  peak  point  only,  the  only  limitation 
on  Nr  is  in  solving  the  synthesis  Eq.  (3).  We  will  use 
this  fact  in  several  of  our  suggested  solutions.  Figure  4 
shows  the  block  diagram  of  a  hierarchical  iconic  filter 
system.11  The  first  stage  of  this  processor  employs 
multiple  PSR  filters  in  a  shift-invariant  correlator. 
The  purpose  of  this  first  stage  is  only  to  locate  candi¬ 
date  objects  in  the  input  field  of  view.  The  filters  used 
are  designed  with  this  in  mind,  and  thus  they  do  not 
provide  discrimination  information.  To  provide  en¬ 
hanced  detectability,  PSR  iconic  filters  are  preferable 
for  this  stage  of  the  processor.  The  second  stage  of  the 
processor  can  employ  multiple  correlation  or  projec¬ 
tion  filters  in  the  same  processor.  These  filters  allow 
large  class  identification  (when  multilevel  outputs  are 
provided),  but  they  can  have  large  sidelobe  levels.  By 
using  the  outputs  from  the  PSR  correlator  in  the  first 
stage  to  determine  where  to  look  in  the  output  correla¬ 
tion  planes  from  the  second  stage,  sidelobe  effects  can 
be  avoided.  In  Fig.  4,  we  show  a  projection  filter 
second  stage,  since  it  allows  LF class  identification  with 


Fig.  4.  Block  diagram  of  a  hierarchical  iconic  filter  system  for  large 
class  pattern  recognition. 


F  filters  and  with  a  simpler  processor  such  as  that  of 
Fig.  3.  This  filter  (and  its  associated  matrix)  also 
requires  fewer  training  set  images  (less  by  a  factor  of  5) 
than  are  needed  in  the  correlation  iconic  filter  synthe¬ 
sis.  An  additional  stage  with  correlation  filters  is  of¬ 
ten  preferable  in  such  a  system,  since  some  false  peaks 
will  occur  in  the  first-stage  processor,  and  the  investi¬ 
gation  of  these  points  using  only  projection  filters  will 
force  some  object  class  decision  for  all  regions  of  inter¬ 
est  in  the  input  scene  (detected  by  the  first  filter  stage). 
Error  correlation l:(  is  another  solution  that  can  aiiow 
projection  filters  to  be  used  directly  without  an  addi¬ 
tional  stage  of  correlation  filters  to  remove  false  region 
of  interest  peaks  from  the  PSR  filter. 

Another  modification  to  the  system  of  Fig.  4  is  to 
perform  feature  space  analysis  in  windows  around  the 
candidate  region  of  interest  areas  indicated  by  the 
PSR  iconic  filters  in  the  first  stage.  When  F  feature 
space  discrimination  functions  are  used  and  encoded 
in  an  F  output  L- level  manner,  a  larger  number  of 
classes  (LF)  can  again  be  identified  and  classified.  If 
we  restrict  analysis  to  only  the  central  value  of  the 
output  from  the  projection  filters,  these  filters  are  in 
essence  feature  space  linear  discriminant  functions 
that  can  operate  on  image  pixel  data  (iconic  filters)  or 
on  image  features  with  equal  facility. 

In  cases  when  the  object  size  is  known  or  can  be 
bounded,  the  window  around  each  region  of  interest 
image  area  can  be  set  and  simple  techniques  can  be 
used  to  place  the  object  in  each  region  of  interest  into 
one  of  several  super  classes  (e.g.,  one  of  4  sets  of  16 
characters  each).  For  the  OCR  case  we  have  found 
simple  object  histograms  and  the  number  of  pixels  in 
the  character  and  in  different  parts  of  it  to  work  quite 
well  to  provide  such  super-class  separation.  Such  in¬ 
formation  then  allows  the  use  of  separate  filters,  each 
opt  imized  on  the  smaller  super  class  of  possible  objects 
and  each  with  significantly  fewer  Nr  training  images. 
We  have  demonstrated  iconic  multilevel  multiple  fil¬ 
ters  in  which  the  object  class  is  known  and  the  purpose 
of  the  filters  is  to  determine  the  object  orientation. >- 
This  represents  yet  another  extension  of  this  hierar¬ 
chical  filtering  concept. 

For  a  specific  problem  (such  as  OCR)  other  informa- 
tin  is  available  such  as:  letters  lie  on  lines  with  regular 
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Table  I.  Correlation  Plane  PSR  =  /i/Sfor  Multilevel  Multiple  Iconic 
Correlation  Filters  as  a  Function  of  the  Number  of  Object  Classes. 


Number  of 
training  images  \r 
(5/ class) 

Correlation  plane 

PSR 

10 

2.04 

20 

1.98 

40 

1.76 

60 

1.48 

too 

1.52 

200 

0.98 

400 

0.0 ' 

9110 

0.006 

Table  11.  Filter  Image  Plane  Scatter  SF  and  Largest  Pixel  Value  as  a 
Function  of  the  Number  of  Object  Classes  A/rfor  Different  Multilevel 
Multiple  Iconic  Projection  Filters. 

•Vr 

S y  (scatter) 

Maximum  pixel  value 

•> 

0.02 

0.05 

15 

0.08 

0.06 

25 

0.18 

0.10 

35 

0.35 

0.22 

75 

0.78 

0.82 

115 

0.87 

0.95 

i:w 

0.89 

0.96 

150 

0.92 

1.29 

170 

1.05 

1.62 

190 

1.16 

1.66 

248 

1.23 

2.31 

980 

18.10 

9.  On 

spacings  dependent  on  the  font  of  the  input  data.  For 
this  case  we  find  that  simple  horizontal  and  vertical 
projections  can  locate  lines  of  text  and  isolate  the 
letters  on  each  line.  In  this  case  the  center  of  each 
character  can  be  determined  quite  simply  with  such  a 
simple  preprocessing  step. 

A  related  issue  of  concern  is  training  set  selection. 
In  many  cases  attention  to  this  issue  can  significantly 
reduce  Nr-  As  an  example  we  refer  to  our  OCR  case 
study  with  15  fonts  of  each  character  available.  We 
must  select  at  least  one  image  of  each  character.  How¬ 
ever,  not  all  15  fonts/character  are  required  to  be  in¬ 
cluded  in  the  training  set.  To  select  the  fonts  to  be 
included  we  look  at  the  cross  correlations  of  each  and 
select  those  with  the  smallest  vector  inner  product 
matrix  entry  u,„„.  This  ensures  us  of  the  most  new 
information  for  each  additional  training  set  image  cho¬ 
sen.  If  the  separation  between  output  levels  in  a  mul¬ 
tilevel  filter  is  A L,  we  select  vmn  <  0.5AL  as  a  useful 
guideline  to  determine  when  to  include  a  given  font 
image  in  our  training  set.  In  Sec.  VI  we  show  quantita¬ 
tive  data  on  the  ability  of  iconic  filtrs  to  recognize 
characters  in  new  fonts  not  included  in  the  training  set 
data. 

VI.  Experimental  Results 

To  obtain  a  quantitative  estimate  of  a  number  of 
object  classes  one  can  include  in  a  correlation  filter, 
multilevel  multiple  iconic  correlation  filters  were  com¬ 


puted  with  one  object  (character)/class  or  font  and  five 
shifted  versions  of  each  (thus  Nr/5  equals  the  number 
of  classes  and  fonts).  For  each  case,  m  and  S  of  the  FT 
of  the  correlation  plane  were  calculated.  The  resul¬ 
tant  PSR  =  m/S  is  listed  in  Table  I.  Assuming  PSR  > 
1.5  is  required,  we  find  that  only  Nr  =  100,  or  20  object 
classes,  could  be  included  in  one  OCR  correlation  fil¬ 
ter.  We  note  that  we  have  found  that  this  value  is 
much  less  for  characters  than  for  other  objects,  and 
thus  OCR  appears  to  represent  a  worst-case  guideline. 

To  quantify  the  effect  of  Nr  on  the  dynamic  range  ot 
the  filter  and  its  image  plane  variance,  we  computed 
the  mean  nr  and  scatter  Sr  in  the  filter’s  image  for 
multilevel  multiple  projection  iconic  filters  with  dif¬ 
ferent  numbers  of  training  images  used  (with  one  im¬ 
age/class  and  with  Nr  now  equal  to  the  total  number  of 
object  classes  or  fonts).  These  data  are  shown  in  Ta¬ 
ble  II.  In  Table  II  we  also  include  the  value  of  the 
largest  pixel  in  the  iconic  image  plane  filter.  We  note 
that  the  scatter  (variance  of  the  pixel  values  in  the 
filter)  increases  with  Nr-  The  maximum  pixel  value  in 
the  filter  image  increases  with  Nr-  rnhe  number  of 
filter  image  pixels  with  large  values  also  increases  with 
Nr-  Thus  more  dynamic  range  or  gray  levels  are  re¬ 
quired  to  represent  filters  synthesized  with  large  Nr- 
Also,  when  noise  is  present,  if  the  noise  changes  one  of 
the  large-valued  (or  key)  image  pixels,  this  will  have  a 
much  larger  effect  than  if  other  image  pixels  are 
changed.  Since  the  number  of  such  key  pixels  and 
their  relative  significance  increases  with  Nr,  we  expect 
noise  effects  to  become  worse  for  large  class  filters 
synthesized  from  a  large  number  of  images.  We  now’ 
quantify  this  result  and  the  amount  of  noise  allowable. 

The  filters  considered  in  subsequent  tests  were  syn¬ 
thesized  from  62  characters  with  4  fonts  of  each  (the 
fonts  used  were  NY  Times,  Datama,  Busweek,  and 
Forbes).  The  multilevel  multiple  projection  filters 
usedF  =  4  filters  with  L  =  3  levels  (0.33, 0.66,  and  0.99), 
thus  allowing  Lh  =  34  =  81  classes,  which  is  sufficient  to 
accommodate  the  62  character  classes.  When  these  F 
=  4  filters  were  shown  any  of  the  62  X  4  =  248  charac¬ 
ters,  the  projection  values  were  ideal  and  perfect  100% 
recognition  was  obtained.  Table  III  shows  the  worst- 
case  outputs  (all  are  within  10“ '  of  the  exact  projection 
values). 

We  now  consider  the  effect  of  noise  on  the  perfor¬ 
mance  of  these  filters.  To  produce  the  noise  we  gener¬ 
ated  a  random  array  of  numbers  between  0  and  1.  By 
thresholding  this  array  at  «,  we  produced  a  binary- 
noise  array  N(x,y)  with  pixels  equal  to  1  if  their  value 
was  <a.  We  then  applied  the  same  N{ x,y)  to  each 
character  image  with  image  pixels  changed  (0  to  1  or  1 
toO)  if  the  corresponding  (x,y)  pixel  in  N(x,y)  is  1.  We 
refer  to  the  result  as  an  image  with  binary  noise.  Test 
results  for  a  =  0.5  corresponding  to  <r"„jse  =  0.25  for  the 
font  Busweek  are  shown  in  Table  IV.  Only  the  worst - 
case  results  are  shown  (those  data  with  projection  val¬ 
ues  which  departed  by  the  most  from  the  ideal  values). 
The  projection  values  are  shown  with  their  difference 
from  the  ideal  values  given  in  parentheses.  As  seen,  61 
of  the  62  images  were  correctly  identified.  We  assume 
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Table  III.  Worst-case  Tests  of  100%  Perfect  Performance  248  Class  Set  of  Four  Multi-level  (0.33,  0.86,  0.99) 

Iconic  Filters 


Input  test 
v  haraeter 

Response* 

tor  filters  /•'}  FI 

FI 

F  2 

/•':! 

FI 

E 

0.4299 

0.6601 

0.0000 

0.6599 

T 

0.6600 

0.4299 

0.2001 

0.9899 

h 

0.6599 

0.0000 

0.9899 

0.6599 

I 

0.9900 

0.0299 

0.4400 

0.9901 

6 

O.J.'iOl 

0.0299 

0.0099 

0.0000 

Table  IV.  Worst-Case  Binary  Noise  Test  Results  (a  =  0.5,  a2  =  0.25,  Busweek) 


Input  test 
character 

Response  land  error)  for  filters  /-'  1 — /-*4 

FI 

El 

Ft 

F4 

/•: 

0.24(0.09) 

0.55(0. 11) 

0.62(0.04) 

0.75(0.24)* 

r 

0.57(0.09) 

0.28(0.05) 

0.29(0.041 

0.91(0.08) 

\Y 

0.58(0.08) 

0.55(0.02) 

0.60(0.06) 

1.04(0.04) 

h 

0.6810.02) 

0.60(0.06) 

0.89(0.101 

0.62(0.04) 

u 

0.62(0.04) 

0.85(0.1  1) 

0.92(0.071 

0.90(0.091 

1 

0.90(0.09) 

0.24(0.09) 

(1.28(0.051 

0.91  <0.081 

0 

0.28(0.05) 

0.20(0.14) 

0.64(0.01 1 

0.56(0.10) 

0 

0.27(0.06) 

o.gyio.iMi 

0.57(0.09) 

0.40(0.04) 

9 

0.28(0.05) 

0.58(0.08) 

0.29(0.04) 

0.41(0.02) 

Table  V.  Worst-Case  Gray  Level  Noise  Test  Results  a2  -  0.1,  Forbes ) 


Input  teM 
chiirm-UT 

Response  (and  erm 

>r)  for  fillers  F\  -F\ 

FI 

F2 

F:t 

Ft 

V 

0.47i(>.04) 

0.91(0.08) 

0.95(0.041 

0.90(0.09) 

H 

0.88io.88i 

0.41(0.02) 

0.88(0.051 

0.42(0.01  > 

V 

0.58(0.08) 

0.41(0.02) 

0.62(0.04) 

0.70(0.04) 

t 

1.17(0.181 

0.89(0.061 

11.20(0.181 

0.54(0.12) 

a 

0.91(0.08) 

0.44(0.01) 

0.87(0. 041 

0.96(0.081 

X 

1.04(0.04) 

0.44(0.01) 

0.60(0.06) 

0.9110.08) 

5 

0.41(0.118) 

0.27(0.06) 

0.62(0.04) 

0.97(0.02) 

6 

0.21(0.12) 

0.28(0.05) 

1.02(0.081 

0.81(0.021 

9 

0.48(0.05) 

0.68(0.02) 

0.85(0.021 

0.82(0.1111 

O 

0.47(0.04) 

0.0210  o|  i 

0.49(0.06) 

0.27(0.06) 

projection  values  with  errors  below  ( AL)/2  =  0.165  will 
be  correctly  thresholded. 

Binary  noise  is  typical  of  the  noise  expected  in  OCR 
applications."  We  next  provided  gray-level  noise 
tests.  We  generated  zero-mean  Gaussian  noise  at  dif¬ 
ferent  variances  and  added  this  to  each  image.  We  set 
pixels  below  0  to  0  and  pixels  above  1 to  1,  but  retained 
all  noise  gray  levels  between  0  and  1.  Test  were  con¬ 
ducted  of  all  248  images  with  noise  present.  The 
worst-case  results  for  the  font  Busweek  are  shown  in 
Table  V  in  the  same  format  used  in  Table  IV.  As  seen, 
60  of  the  62  images  were  correctly  identified.  The 
gray-level  noise  used  had  (rfKlist.  =  0.1.  When  the  noise 
variance  was  reduced  to  <r“llls(.  =  0.08,  we  obtained  100% 
correct  recognition  of  all  characters.  We  note  that  the 
input  SNR  is  about  31  for  <r^nis(,  =  0.08.  Figure  5  shows 
several  binary  and  gray-level  noisy  input  images  cor¬ 
rectly  identified. 

We  now  return  to  Table  II  and  our  theoretical  analy¬ 
sis  indicating  that  noise  sensitivity  and  the  number  of 
key  image  pixels  increases  with  Nr.  Refer  to  Table  V, 
which  shows  that  the  projection  of  the  letter  E  on  filter 


FA  was  0.75  (in  error  by  0.24)  with  =  0.25.  We 
reduced  the  noise  threshold  to  produce  noise  with 
=  0.24  (only  0.01  different  from  the  prior  value).  For 
this  noisy  image  of  the  letter  E  we  found  the  projection 
of  the  letter  E  on  the  fourth  filter  to  be  0.98  (nearly  the 
ideal  0.99  level).  Thus  with  a  slightly  different  noise 
realization  or  a  slightly  different  noise  level  (such  that 
key  image  pixels  were  not  affected),  much  larger  noise 
levels  can  be  tolerated.  By  selecting  different  projec¬ 
tion  values  for  different  images  and  by  assigning  simi¬ 
lar  projection  codes  to  similar  characters,  control  over 
the  number  of  key  filter  pixels  and  a  reduction  in  their 
value  is  possible. 

We  now  consider  tests  of  these  iconic  filters  with 
input  test  images  in  fonts  that  were  never  seen  during 
filter  synthesis.  Table  VI  shows  the  worst  case  results 
for  tests  on  input  data  in  the  font  Scienam.  As  seen, 
only  one  error  in  all  62  characters  occurred.  Thus 
properly  designed  iconic  filters  can  recognize  test  data 
that  they  have  never  seen.  By  including  font  s  of  sever¬ 
al  selected  characters,  full  100%  recognition  is  possible. 
The  present  tests  were  included  to  show  performance 
with  a  limited  training  set. 


1  June  1987  t  Vol.  26.  No  11  /  APPLIED  OPTICS 


227  1 


Table  VJ.  Worst-Case  New  Fonl  (Sclenam)  Test  Results  (Error  From  Ideal  Level  in  Parenthesis) 


Input  test 

character  Font 

Response  (and  error 

)  for  filters  Fl  F4 

FI 

F'2 

r.t 

Ft 

r 

0.48(0.18) 

0.14(0.15) 

0.98,0.01 ) 

0.92(0.07) 

s 

0.59(0.10) 

0.:U(0.02) 

0.28(0.05) 

0.67(0.01 1 

s  Fimes 

0.97(0.02) 

0.90(0.05) 

().:« 

0.58(0.051 

•> 

0.97(0.04) 

a.izio.oi ) 

o.:i6au>;,) 

1.01(0.02) 

4 

0.91(0.02) 

0.55(0.00) 

(U>9iu.o:ii 

o.:Ul(U)2) 

c 


Fig-  •">.  Typical  noisy  characters  with  different  noise  variances:  (a  > 

-  o.( IN  (gray-level  noise);  <h)  n:  =  0. 1  (gray-level  noise);  (c)  n-  - 
0.-4  (binary  noise). 

For  practical  optical  realization  the  dynamic  range 
of  the  filter  function  cannot  be  seven  decimal  digits  as 
in  digital  simulations.  To  quantify  the  amplitude  and 
phase  dynamic  range  required  in  the  frequency-do¬ 
main  iconic  filter,  we  computed  the  filters  used  to 
digital  machine  accuracy  and  then  quantized  these 
filters  to  different  numbers  of  amplitude  and  phase 
levels.  The  worst-case  test  results  were  analyzed  for 
the  correlation  of  our  multilevel  (L  =  3)  multiple  (F  = 
4)  filters  in  tests  against  the  62  characters  in  the  train¬ 
ing  set  font  New  York  Times  data.  These  results  are 
typical  of  those  obtained  for  other  fonts.  These  re¬ 
sults  showed  that  a  filter  quantized  in  the  frequency 
domain  to  32  amplitude  levels  and  360  (1°  resolution) 
phase  levels  in  the  frequency  plane  performed  most 
excellent,  with  only  two  errors  out  of  the  62  characters 
(96%  recognition)  with  these  low  quantized  filter  lev¬ 
els.  The  use  of  slightly  more  amplitude  levels  and 
much  less  phase  levels  also  yielded  perfect  100%  recog¬ 
nition.  Other  tests  performed  considered  the  unifor¬ 
mity  of  response  of  the  input  spatial  light  modulator 
used.  These  tests  showed  excellent  performance  for 
*>%  worst  case  variation  in  the  spatial  uniformity  of  the 
input  image  plane  data.  We  found  that  up  to  30% 
worst -case  nonuniform  spatial  response  in  the  input 
device  could  be  tolerated  and  acceptable  results  still 


obtained.  Other  tests  involved  rotations  of  the  input 
object  which  showed  no  degradation  loss  with  several 
degrees  of  rotation  of  the  input  object. 

VII.  Summary  and  Conclusion 

The  issue  of  large  class  object  recognition  has  been 
addressed.  New  filters  for  such  problems  have  been 
described  and  several  hierarchical  architectures  using 
them  have  been  discussed.  Attention  was  given  to 
filter  synthesis  problems  foreseen  when  the  number  of 
classes  is  large.  A  theoretical  basis  for  the  sidelobe 
and  noise  performance  of  such  filters  was  advanced 
and  quantified  by  experiment.  Initial  results  are 
quite  attractive.  Hierarchical  correlators  and  multi¬ 
level  multiple  iconic  filters  are  a  viable  and  attractive 
solution.  They  appear  preferable  to  an  exhaustive 
search  of  all  available  training  images.1"1  Training  set 
selection  can  reduce  the  number  of  images  necessary 
and  hence  clutter.  Proper  code  selection  can  improve 
performance  and  reduce  various  error  sources.  Near¬ 
perfect  recognition  of  a  large  number  of  objects 
( ~  1 000 )  with  only  four  filters  with  moderate  filter 
dynamic  range  requirements  appears  possible.  Initial 
OCR  tests  have  quantified  these  remarks. 
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ABSTRACT 

A  new  and  efficient  real  time  technique  to  produce  a  string  code  description  of  the  contour  of  an 
object,  such  as  an  (angle,  length)  =  (<t>,  s)  feature  space  for  the  arcs  describing  the  contour,  is 
detailed  We  demonstrate  the  use  of  such  a  description  for  an  aircraft  identification  problem  case 
study.  Our  (<p,  s)  feature  space  is  modified  to  include  a  length  string  code  and  a  convexity  string 
code  This  feature  space  allows  both  global  and  local  feature  extraction  The  local  feature  extraction 
follows  human  techniques  and  is  thus  quite  suitable  for  a  rule-based  processor  (as  we  discuss  and 
demonstrate)  Aircraft  have  generic  parts  and  thus  are  quite  suitable  for  the  model-based  description. 

1  INTRODUCTION 

Aircraft  recognition  is  a  classic  pattern  recognition  problem  recently  surveyed  jlj.  Many  feature 
spaces  have  been  suggested  for  such  multiple  degree  of  freedom  pattern  recognition  problems.  These 
include:  moments  [2,3  (which  require  large  dynamic  ranges  and  are  noise  sensitive  when  made 
distortion-invariant);  Fourier  descriptors  [4,5;  (which  still  require  feature  extraction,  computationally 
intensive  matching  lists,  and  which  do  not  lend  themselves  to  use  of  local  information  or  features), 
and  various  curvature  features  Our  proposed  technique  handles  global  and  local  features,  includes 
feature  extraction  with  in-plane  distortion-invariance  and  avoids  a  large  matching  search. 

We  selected  a  st:.ng  code  description  of  the  object.  Other  work  with  similar  descriptions  [6-9 1  has 
also  been  used  and  their  VLSI  realization  discussed  jlO-12].  However,  our  string  code  description  (<?.  s) 
=  (angle,  length)  of  the  arcs  on  the  contour  of  an  object  is  generated  most  efficiently  and  allows 
global  and  local  feature  space  analysis  Globa!  features  are  necessary  for  general  problems  and  local 
features  allow  specific  problems  to  be  solved  quite  effectively.  The  local  features  we  use  correspond  to 
specific  object  parts  and  thus  allow  rule-based  analysis  (since  this  is  the  manner  in  which  humans 
achieve  identification).  Our  edge  description  is  different  from  the  conventional  chain  code  [9)  and  we 
do  not  convert  the  chain  code  to  an  (x,  y)  or  other  description  as  others  [7]  do  early  in  the  processing 
period.  Our  rule-based  technique  differs  from  syntactic  [13]  techniques.  Our  rule-base  follows  a 
forward  chaining  control  flow  as  does  SPAM  [14]  As  our  model  knowledge,  we  employ  specific 
aircraft  structural  and  part  information. 

Section  2  describes  our  case  study,  model  base,  and  data  base  Section  3  provides  an  introduction 
and  overview  of  our  focessor  and  our  feature  space.  Section  4  details  our  new  efficient  feature  space 
generation  technique  and  includes  typical  results.  Section  5  briefly  discusses  our  rule-based  processor 


2.  DATA  BASE 


The  case  studv  we  consider  is  the  identification  and  orientation  estimation  of  10  different  aircraft 
Fig  1  shows  the  top-down  views  of  these  aircraft  grouped  by  the  functional  role  of  the  aircraft  In  our 
tests,  all  aircraft  are  128  x  128  pixels  in  resolution  Our  model  base  contains  different  polygon 
descriptions  of  all  aircraft  and  their  parts,  from  which  any  aspect  view  can  be  produced  quite  easily 

[15]. 


3.  PREPROCESSOR  OVERVIEW 


Our  full  processor  contains  five  major  sections  as  shown  in  Fig. 2  The  preprocessor  performs  edge 
enhancement  (this  is  necessary  to  produce  good  peaks  in  the  Hough  transform  space  we  will  employ) 
and  generates  a  clockwise  ordered  list  of  pixel  coordinates  for  the  contour  or  boundary  of  the  object 
(using  classic  techniques  [16,17]).  The  feature  space  produced  is  a  (6,  s)  description  of  the  angle  (©) 
and  the  length  (s)  of  all  arcs  clockwise  in  a  string  code  connected  object  boundary  or  contour 
description.  An  aspect  estimator  unit  determines  if  the  aircraft  is  being  viewed  nearly  top-down  or  if 
an  out-of-plane  distorted  image  is  being  investigated.  A  rule-based  or  an  associative  processor  are  used 
(depending  upon  the  aircraft  object’s  distortions).  In  this  present  paper,  we  discuss  the  rule-based 
processor.  Thus,  in  this  initial  work,  we  will  restrict  attention  to  nearly  top-down  aircraft  views 

4  EFFICIENT  (o,  s)  STRING  CODE  FEATURE  SPACE  GENERATION 

The  first  step  is  to  reduce  the  clockwise  ordered  contour  pixel  list  to  N  (approximately  20-30) 
vertices.  Fig. 3  shows  a  DC10  (Fig. 3. a)  and  its  boundary  description  with  the  vertices  noted  (Fig  3. b ) 
The  N  vertices  define  N  arcs  for  the  boundary,  each  with  a  length  (s)  and  an  internal  ?tigle  (©) 
Fig  3.c  defines  the  angle  4>.  The  result  is  a  (©,  s)  string  code. 

The  block  diagram  of  our  efficient  (©,  s)  string-code  generation  system  is  shown  in  Fig  4  We  use 
the  clockwise-ordered  contour  list  of  the  boundary  pixels  (x,  y),  form  the  Hough  transform  (HT)  of  the 
input  from  the  original  data,  and  locate  the  six  major  (and  true)  HT  peaks  and  their  (p,  6)  values. 
We  then  Hough  transform  each  contour  pixel  and  check  if  it  evokes  a  peak  at  one  of  the  (p,  (?)  six 
major  HT  peak  parameter  locations.  This  assigns  most  contour  points  to  the  six  major  lines  in  the 
image  and  gives  automatically  (without  time-consuming  trigonometric  operation)  the  angle  <p  and  the 
length  (s)  of  these  lines.  Only  a  small  fraction  of  the  pixel  points  in  the  contour  list  remain  to  be 
assigned  4>  and  s  values.  Each  of  these  is  a  connected  set  of  pixels  that  lies  in  a  gap  between 
previously  assigned  points.  We  achieve  the  {<p,  s)  description  of  these  pixels  into  lines  by  a 
conventional  split-line  fitting  method  [18,19].  This  split-line  technique  is  computationally  expensive, 
but  (with  the  six  major  lines  and  our  HT  technique)  'his  needs  only  to  be  applied  to  a  significantly 
reduced  number  of  points  in  the  contour  list.  Thus,  this  technique  generates  the  full  (<?,  s)  string  code 
description  quite  efficiently. 

A  HT  converts  lines  in  the  input  into  points  in  a  (p,  (?)  parameter  Hough  space,  i.e.  a.  coordinates 
corresponding  to  the  normal  distance  (p)  and  the  angle  (0  with  respect  to  the  x  axis)  of  the  normal  of 
the  line,  with  six  peak  heights  proportional  to  the  number  of  points  on  the  line  (or  the  length  of  the 


Figure  2:  Overall  Processor 


Y 


C  (  X3.Y3) 


(a)  DC10  (b)  DClO  vertices  (c)  Angle  <p  Definition 

Figure  3:  Example  of  vertices  describing  an  object  boundary  (Fig. 3  a  and  b) 
as  arcs  of  length  s  and  internal  angle  <?  [<t>  is  defined  in  Fig.  3  c) 


Preprocessor  Feature  Space  Generation 


Figure  4:  Block  diagram  of  an  efficient  (<?,  s)  string  code  processor 

line).  Fig  o  a  shows  the  HT  for  the  DClO  with  the  nose  vertical  Fig  5  b  shows  the  HT  for  th 
with  the  nose  horizontal  The  two  major  peaks  in  Fig  5. a  lie  on  the  #=  0°  line  and  in  Fig  5  b 
on  the  6 =  90  °  line  These  two  major  peaks  denote  the  presence  of  the  fuselage  and  its  onenta 
Fig  5,  we  see  six  major  peaks,  however  this  does  not  always  occur  (when  noise,  quantization 
image  resolution,  and  3-D  roll  and  pitch  distortions  occur).  To  demonstrate  this  and  techni 
overcome  these  problems,  we  show  (in  Table  1)  the  10  largest  HT  peaks  obtained  for  tin 
oriented  at  120°  .  This  demonstrates  specifically  that  the  largest  six  HT  peaks  do  not  corres] 


the  major  lines  in  the  image,  specifically  HT  peak  6  and  7  are  false  peaks  that  are  larger  than  pe 
(which  is  the  next  largest  true  peak).  We  note  [20]  that  such  false  peaks  occur  close  to  the  true  ] 
(within  three  pixels  for  our  aircraft  data).  Thus,  we  employ  an  algorithm  that  ignores  HT  s 
peaks  that  lie  within  four  pixels  of  the  large  peak.  Employing  th’S  rule,  the  six  proper  p 
corresponding  to  the  six  major  lines  in  the  aircraft  image  emerged  (Table  2).  Table  3  lists  tlu 
aircraft  lines  corresponding  to  the  six  major  HT  peaks  and  Fig. 6. a  shows  the  lines  in  the  airc 
image  itself.  Fig.6.b  shows  the  resultant  final  ( <f> ,  s)  image  with  all  vertices  obtained  (including  t 
obtained  by  the  split-line  fitting  technique). 

An  efficient  technique  to  assign  the  6  and  p  parameters  of  the  six  HT  peaks  to  point  in  the  con 
list  is  now  detailed  To  achieve  this,  we  transform  each  pixel  coordinates  (x,  y)  in  the  clock 
contour  list  into  a  sinusoid.  This  sinusoid  needs  only  be  evaluated  at  the  six  6  values  of  the 
dominant  HT  peaks  and  at  the  p  coordinates  within  each.  Thus,  these  HT  operations  on  the  con 
list  are  easily  achieved.  Since  we  expect  a  number  of  successive  pixels  in  the  contour  list  (those 
each  arc)  to  correspond  to  the  same  HT  peak  point,  the  processor  can  be  quite  fast  (and  very  effici 
compared  to  typical  techniques  involving  extensive  trigonometric  calculation). 

We  now  discuss  the  descriptions  we  employ  of  the  string  code  representation  of  the  object 
symbolic  descriptor.  We  first  consider  the  full  (cf>,  s)  string  code  with  the  exact  analog  values  fo: 
angles  and  lengths.  Next,  we  consider  a  convexity  string  code.  This  lists  only  the  convexity  of 
angles  of  the  arcs  in  the  boundary  representation  as  convex  V  (if  0  <  180  ° )  or  concave  C  (if  0 
180°).  Last,  we  consider  a  length  string  code  which  lists  only  the  length  of  each  arc  as  :  very  sh 
short,  medium,  long,  and  very  long.  These  are  expressed  in  terms  of  maximum  differ 
A  —  Lmax— Lmin  in  the  length  L  of  the  arcs  for  the  input  image.  Each  length  region  is  A/6  ex< 

for  the  medium  length  region  which  is  A/3  in  extent.  These  different  symbolic  string  < 
descriptions  of  the  object  contour  are  found  to  be  quite  useful  for  global  and  local  rule-b: 
processing,  as  described  in  Section  5. 

5.  RULE-BASED  PROCESSOR 


Our  rule-based  system  employs  if-then  rules,  a  context-limited  and  rule-ordered  control  strai 
and  forward  chaining  with  five  rule  groups  used  as  we  now  describe.  The  first  rule  group  (star 
rules)  locates  the  fuselage. 

The  second  rule  group  concerns  substructure  search  rules.  The  purpose  of  this  second  rule  grou 
to  locate  all  separate  regions  of  an  object  and  to  divide  them  into  left  (L)  and  right  (R)  regions  > 
respect  to  the  fuselage  We  first  extract  the  fuselage  and  all  vertices  corresponding  to  it. 
separates  the  contour  list  into  L  and  R  regions.  We  group  these  into  separate  connected  reg 
(closed  polygon  boundaries)  corresponding  to  parts  of  the  object.  For  each  such  region,  we  calcu 
its  area,  perimeter,  compactness,  and  its  position  with  respect  to  the  fuselage.  Various  rules  are  i 
to  determine  the  type  of  each  region.  Three  representative  examples  are  given  below: 
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Table  1:  Data  on  the  10  largest  peaks  for  a  DC10  with  its  nose  at  120  ° 
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Table  2:  Data  on  the  six  largest  HT  peaks  using  our  false  peak  algorithm. 
The  six  peaks  noted  are  the  correct  ones. 


Corresponding  Aircraft  Part 

Right  Line  on  Fuselage 
Left  Line  on  Fuselage 
Right  Front  Wing  Line 
Right  Rear  Wing  Line 
Left  Front  Wing  Line 
Left  Rear  Wing  Line 


(a)  (b) 

Table  3:  6  major  lines  in  an  aircraft  Figure  6:  Aircraft  Image  with 

(a)  only  the  six  major  arcs  and  (b)  all  arcs 

Rule  1:  Wings  are  the  largest  regions  in  L  and  R.  They  must  have  the  proper 
spatial  relationship  to  the  fuselage 

Rule  2:  If  the  convexity  symbolic  code  for  a  region  has  all  vertices  convex, 
then  this  region  is  a  wing  with  no  engines  etc  on  it 

Rule  3:  If  the  convexity  symbolic  code  for  a  region  has  two  concave  vertices 
out  of  four  adjacent  vertices  and  if  this  correspond  to  short  arcs,  then  this 
region  is  a  wing  with  an  engine  etc  on  it. 

From  the  location  of  the  concave  vertices  and  arcs  of  short  length,  the  position  of  the  engine  et 
(refered  to  as  a  "blob")  or  small  structure  on  the  wing  (or  fuselage)  can  be  determined  We  discus 
this  further  below.  Fig  7  shows  examples  of  a  wing  region  with  no  engine  (Fig.T.b)  as  detected  from  it 
convexity  code  (Fig. 7. a).  Fig  8  shows  an  analogous  example  when  the  convexity  code  (Fig  8. a)  show 
several  C  sections  and  hence  indicates  the  presence  of  an  engine  in  the  image  of  Fig.S.b.  Followin 
such  rules,  we  can  segment  the  L  and  R  regions  into  parts  as  shown  in  Fig .9  (wings,  tails,  and  blobs) 

The  third  rule  group  we  use  provides  a  check  on  the  top-down  orientation  estimation  (this  : 
obtained  from  the  number  of  regions  in  L  and  R,  the  areas  of  these  regions,  and  the  symmetry  of  th 
L  and  R  sections),  yaw  estimates  (these  are  obtained  from  the  6  coordinate  of  the  fuselage  peak  in  th 
HT  space),  and  roll  estimates  (from  the  symmetry  or  ratios  of  areas  in  regions  L  and  R). 

The  fourth  rule  group  concerns  substructure  rules.  These  are  intended  to  identify  the  small  or  Iocs 
features  or  object  regions  or  parts.  The  best  example  of  this  concerns  "blobs"  on  wings  an 
specifically  whether  these  are  engines,  missiles,  or  fuel  tanks.  For  the  image  data  base  we  considerec 
we  note  (from  Fig.  1 )  that  if  the  blobs  appear  in  the  center  of  the  wing,  the  blob  is  an  engine  (e  g 
DClO);  and  if  it  appears  on  the  tip  of  a  wing,  it  is  a  missile  (e  g  F104). 

The  fifth  rule  group  contains  classification  rules  We  note  three  examples  below  There  ar 
approximately  40  rules  used  in  total  The  following  are  intended  to  be  representative  examples.  Befor 
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Figure  7:  Example  of  a  convexity  code  (a)  for  a  wing  region  with  no  engine  (b) 


(a) 


(b) 


Figure  8:  Example  of  a  convexity  code  (a)  for  a  wing  region 


with  an  engine  on  it 


Figure  9:  Representative  left  (L)  and  right  (R)  segmented  regions  of  an  aircraft 


discussing  these,  we  note  one  additional  parameter  included  in  our  feature  space  parameters  the 
angles  and  <p 2  that  the  wings  make  with  the  fuselage  at  points  A  and  B  (see  Fig. 10). 


Figure  10:  Definition  of  the  internal  angles  <j> j  and 
at  vertex  points  A  and  B  in  an  aircraft 

Using  these  blob  and  angle  parameters,  we  note  three  rules  as  examples: 

Rule  1.  If  a  blob  is  present  on  a  wing,  and  if  it  is  an  engine  (i.e.  in  the  center 
of  the  wing),  and  if  the  angle  at  vertex  A  (Fig. 10)  >  245  '  , 

then  the  aircraft  is  a  Swearingen. 

Rule  2:  If  a  blob  is  present  on  a  wing,  and  if  it  is  an  engine,  and  if  the  angle  <p } 
at  vertex  A  <  245 '  ,  then  the  aircraft  is  a  DClO. 

Rule  3:  If  a  blob  is  present  on  a  wing,  and  if  it  is  not  an  engine  (i.e.  it  exists 
at  the  tip  of  the  wing),  then  the  aircraft  is  an  Fl04. 

Comparison  of  the  Fig.  1  images  and  these  rules  shows  that  these  rules  correctly  classify  these  aircraft 
noted 

6.  SUMMARY  AND  CONCLUSION  ‘ 

We  have  advanced  an  efficient  HT  technique  to  assign  lengths  and  angles  of  most  arcs  to  a 
clockwise  pixel  coordinate  list  of  the  contour  or  boundary  points.  This  is  complemented  by  a  split-line 
fitting  algorithm  which  need  be  applied  only  to  small  gaps  in  the  residual  boundary.  For  the  case 
study  of  an  aircraft  data  base  (which  is  very  suitable  for  model-based  description),  we  separate  the 
object  into  L  and  R  regions,  each  described  by  connected  polygons,  each  of  which  are  identified  as 


wings,  tails,  fuselage,  engines,  etc.  Convexity  and  length  symbolic  string  codes  aid  this  separatioi 
This  feature  space  is  most  efficiently  obtained  and  it  allows  us  to  apply  both  global  features  (suitabl 
for  general  pattern  recognition)  and  local  features  (necessary  to  handle  distorted  objects  and  parti; 
images).  The  local  features  used  correspond  to  specific  object  points  (easily  obtained  and  described  1 
our  symbolic  notation)  that  humans  also  relates  to.  The  feature  space  and  case  study  considere 
(aircraft  identification)  lends  itself  naturally  to  a  rule-based  processor.  Examples  of  rules  and  their  us 
in  the  identification  of  aircraft  classes  were  provided.  The  genera!  technique  is  the  most  flexible.  Whe 
augmented  with  an  associative  processor,  the  potential  of  the  system  is  even  further  increased 
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Real-time  deformation  invariant  optical  pattern 
recognition  using  coordinate  transformations 


David  Casasent,  Shao-Feng  Xia,  Andrew  J.  Lee,  and  Jian-Zhong  Song 


The  well-known  scale  and  rotation  invariant  polar-logarithmic  coordinate  transformation  is  used  to  achiev 
in-plane  distortion  invariant  pattern  recognition.  The  coordinate  transform  is  produced  by  a  computer 
generated  hologram  on  a  laser  printer.  Attention  is  given  to  weighting  terms  in  the  output  and  their  effect  oi 
resolution  and  the  number  of  input  plane  pixels  removed  near  the  origin.  The  optically  produced  coordinat 
transformed  input  pattern  is  interfaced  to  a  correlator  by  a  pocket  liquid  crystal  TV  to  provide  real-lim 
processing.  Experimental  results  arc  included. 


I.  Introduction 

Optical  pattern  recognition  using  a  matched  spatial 
filter  and  a  correlator  is  a  well-known  technique.1  ft  is 
advantageous  due  to  its  high  speed  and  parallel  pro¬ 
cessing.  But  the  conventional  correlator  cannot  rec¬ 
ognize  scaled  or  rotated  images  of  the  reference  object. 
For  example,  for  a  1%  scale  change  of  the  reference 
object,  the  SNR  of  the  resultant  correlation  peak  can 
be  10  dB  down  from  that  of  the  autocorrelation,  and  a 
20-dB  loss  can  occur  for  a  1.7°  rotation  of  the  input 
from  the  reference.-  This  disadvantage  limits  the  po¬ 
tential  applications  of  the  conventional  correlator. 
One  solution  to  these  problems  is  development  of  a 
space  variant  optical  processor  which  is  realized  by 
applying  a  coordinate  transformation  preprocessing 
operation  to  the  input  and  reference  data.8  Coordi 
nate  transformations,  such  as  the  logarithmic  transfor¬ 
mation  (which  results  in  a  Mellin  transformation, 
which  is  scale  invariant),  the  polar  (r  -  6)  transforma¬ 
tion  (which  results  in  rotation  invariance),  and  the 
combination  of  the  two:’  (the  lnr  -  6  coordinate  trans¬ 
formation,  which  results  in  scale  and  rotation  invari¬ 
ance),  have  been  reported. 

Here  we  report  the  optical  implementation  of  defor¬ 
mation  invariant  real-time  optical  pattern  recognition 
using  a  computer-generated  hologram  (CGH)  and  a 
liquid  crystal  television  (LCTV).  The  CGH  is  used 
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with  a  Fourier  transform  lens  to  perform  the  lnr  -  l 
coordinate  transformation.  The  use  of  a  holograrr 
consisting  of  many  interferometrically  produced  holo 
graphic  optical  elements  (HOEs)  for  coordinate  trans 
forms  has  been  demonstrated.-1  The  principle  of  usinf 
a  CGH  for  a  coordinate  transformation  was  -lemon 
strated  earlier  for  the  Mellin  transform'  and  for  the 
circle-to-point1'  and  lnr  —  6  transformations.  A  dis 
cussion  of  the  fabrication  of  our  CGH  is  presented  ir 
Sec.  II  together  with  several  issues  associated  with  the 
optical  coordinate  transformation  and  their  effects  or 
our  real-time  correlator.  The  LCTV  and  a  TV  camert 
are  used  to  connect  the  coordinate  transform  prepro 
cessing  system  to  a  conventional  optical  matched  spa 
tial  filter  correlator  in  real  time.  The  LCTV  intro 
duces  a  phase  distortion  in  the  wavefronts  passing 
through  it  which  has  been  corrected  using  a  phast 
conjugate  filter.8  Real-time  scale  and  rotation  invari 
ant  pattern  recognition  is  demonstrated  experimental 
ly  in  Sec.  III.  Our  conclusions  are  advanced  in  Sec.  IV 

II.  Design  of  the  Coordinate  Transformation  CGH 

The  system  to  achieve  the  lnr  —  6  coordinate  trans 
formation  is  shown  in  Fig.  1.  The  input  f(x,y)  is  placet 
in  contact  with  a  continuous  phase  CGH  with  trans 
mittance  h(x,y)  =  exp[/0(x,y)),  where  <fi(x,y)  is  th< 
phase  distribution  of  the  phase  filter.  Lens  L,  form; 
the  Fourier  transform  of  the  product  f(x,y)h(x,y)  a 
the  plane  P\,  where  we  find 

FUi.r)  -  j  j  /(.v.\  I  exp[/i,M.v..vl| 

X  exp)—./ (2 jt/'X/j  )(xu  +  yr)jd.vi/\.  1 1 

where  A  is  the  wavelength  of  the  laser  used,  and  ft.  is  thi 
focal  length  of  lens  L,.  For  the  lnr  -  0  coordinati 
transformation,  we  desire 
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lllJ.Vl  =  111  (->"  +  \"V  -  =  liv. 

i(A.v)  =  -tan  '(v/jrl.  (2) 

and  the  integral  in  Eq.  (1)  can  be  solved  using  the 
approximate  saddle  point  integration  method.9  For 
the  coordinate  transform  in  Eq.  (2),  a  continuous 
phase  solution  <f>(x,y )  exists  since  u(x,y)  and  v(x,y) 
have  continuous  partial  derivatives  and  since  the  par¬ 
tial  derivatives  of  u  with  respect  to  y  and  of  v  with 
respect  to  x  are  equal.  The  desired  phase  function  is 

iMx.v)  =  (2jr/,\/;  >|.v  ln(jr  +  y2)1  2  -  y  tan_,(y/x)  -  x|.  CO 


A.  CGH  Design 

There  are  several  techniques  that  may  be  used  to 
form  the  desired  phase  filter.10  Since  the  amplitude 
transmittance  of  h(x,y)  is  one,  we  need  only  record  the 
phase  function,  and  since  this  is  recorded  by  position¬ 
ing  the  data  on  the  mask,  binary  CGH  recording  tech¬ 
niques  can  be  used.  Since  a  continuous  phase  function 
solution  exists,  we  thus  use  a  binary  computer-gener¬ 
ated  interferogram 1 1  for  the  CGH.  The  interferogram 
is  the  interference  pattern  of  4>(x,y)  and  a  plane  wave 
reference  at  an  angle  B.  The  maxima  of  this  interfer¬ 
ence  pattern  (the  locations  of  the  interference  fringes 
or  the  lines  that  must  be  plotted  on  the  CGH)  must 
satisfy 

—  <M  a\v)  =  2nn.  (4) 

where  n  is  an  integer  which  denotes  different  fringes 
and  where  the  carrier  frequency  a  =  (sin  ft)/\.  The 
recorded  CGH  is  generally  photoreduced  onto  film, 
and  Eq.  (4)  describes  the  final  CGH.  To  avoid  over¬ 
lapping  between  the  first-order  and  second-order  dif¬ 
fracted  waves  in  the  diffraction  plane  P,,  a  must  satis¬ 
fy" 

a  >  ( 1  .;>/ 7T I  max  .  -  <;•>> 

I  (,-v  ] 

Inserting  Eq.  (3)  into  (5)  with  xmax  and  ylnax  being  the 
maximum  size  of  the  input  image  or  the  CGH,  we 
obtain 

..  XliA/j  llnU^  +  y^l1-.  «!) 

This  result  has  not  previously  been  given  full  attention 
and  is  of  concern  since  it  affects  resolution,  as  we 
discuss  in  Sec.  II. C.  We  note  that  we  detect  only  the 
first-order  diffraction  pattern  at  Pi.  In  the  experi¬ 
ments  that  we  performed,  we  used  the  parameters  xmax 
=  5  mm,  ymax  =  5  mm,  A  =  0.6328  gm ,  and  ft  =  400  mm. 
From  Eq.  (6),  we  then  find  «  >  23  line  pairs/mm  is 
required.  We  used  n  =  400  fringes  in  Eq.  (4)  for  «  =  40 
line  pairs/mm.  We  solved  for  the  various  (x,y)  that 
satisfy  Eq.  (4)  for  each  value  of  n,  connected  these 
points,  plotted  the  associated  lines  on  an  Imagen  300 
laser  printer,  and  then  photoreduced  the  plot  to  the 
final  CGH  size  of  10  X  10  mm. 

B.  Space  Bandwidth  Product  Requirements 

This  lnr  -  ft  input  image  representation  space  (that 
is  scale  and  rotation  invariant)  is  detected  by  a  TV 


F(u.v) 


Fig.  1.  Schematic  of  optical  coordinate  transformation  system. 


camera  at  P\  of  Fig.  1,  and  the  electronic  output  froir 
the  TV  camera  is  then  fed  to  an  LCTV  in  the  inpul 
plane  of  an  optical  matched  spatial  filter  frequency 
plane  correlator.  We  now  relate  the  space  bandwidth 
product  (Nr  X  N„)  required  in  the  lnr  —  ft  space  at  P\  tc 
the  input  image  space  bandwidth  product  NxN  =  N- 
at  P().  The  radial  Ar  and  angular  Aft  spatial  sampling 
increments  are  both  s2 /N,  i.e.,  a  factor  of  v  2  larger  thar 
the  reciprocal  of  the  number  of  input  samples  N.  In 
eluding  the  effect  of  the  number  of  samples  M  omitted 
near  the  origin  of  the  input  image  pattern,  we  find 

.V,  =  ,V  ln(A7A/)/\  2.  ,V„  =  (4  ,V/v2)  tarT'i.V/2).  (7 

These  results  follow  from  others11  extended  to  the  case 
of  an  lnr  —  ft  transform.  The  2-D  space  bandwidth 
product  required  at  P]  to  sample  adequately  the  lnr  —  ( 
plane  is  thus 

,Vr  ;V„  =  2.V-  !n(.V/Mt  tan_1(.\72)  =  ir.V-  lnl.V/A/l.  (8 

where  the  final  result  follows  for  large  N. 

C.  Intensity  Weighting  Effects 

To  evaluate  Eq.  (8),  we  must  select  M.  To  do  this,  we 
consider  the  weighting  present  at  Pi  and  then  obtain  ■< 
new  criteria  for  selection  of  M  and  hence  the  Pi  resolu 
tion  required  for  a  given  input  P>  resolution.  Thi 
intensity  of  each  transformed  point  (uaiva)  in  the  P 
output  F(u,v)  is10 

=  +  y-ll.  H 

where  4>mn  denotes  the  partial  derivative  of  <P(x,y)  wit! 
respect  to  m  and  n  and  where  (x„,ya)  is  the  input  poin 
in  Po  that  contributes  to  the  output  point  (u„, va)  in  P 
From  Eq.  (9),  we  see  that  the  P\  pattern  associate 
with  a  given  input  Prt  point  depends  on  the  intensity  c 
each  input  point  and  its  position  in  Po.  Our  concern  i 
the  effect  of  the  positional  weighting  factor  given  b 
the  square  radius  r-  =  (x-  +  >’*)  of  each  input  point  i 
Po.  The  effect  of  the  r-  =  (x-  +  >  -)  weighting  factor 
best  described  for  the  case  of  an  input  f(x,y)  pattern  ( 
uniform  intensity.  In  this  case,  points  further  froi 
the  optic  axis  in  P( ,  will  be  brightest  in  the  coordinai 
transform  pattern  at  Pi,  and  points  near  the  center  t 
Po  (near  r  =  0)  will  be  the  dimmest  in  P\.  This 
attractive  since  these  points  must  be  omitted  in  the  li 
coordinate  transform.  Tapering  of  the  input  ilium 
nating  light  can  conceptually  correct  this  effect  (e 
cept  near  r  =  0,  which  is  not  of  concern  since  this  regi< 
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is  blocked).  Without  correction  for  this  effect,  a 
scaled  input  image  will  result  in  the  same  shaped  P\ 
pattern  but  with  a  different  intensity  (larger  intensity 
if  the  input  object  is  larger).  When  this  transformed 
pattern  is  used  in  a  correlator,  the  r-  weighting  is  of  no 
concern,  since  the  matched  filter  would  also  include 
the  same  r2  weighting. 

Our  present  concern  with  the  r2  weighting  term  in 
Eq.  (9)  is  it  effects  on  M  and  the  size  of  each  diffraction 
order  in  P,.  Points  near  r  =  0  in  Po  map  to  high 
frequencies,  and  these  frequencies  approach  infinity 
for  Po  points  approaching  r  =  0.  Thus  separation  of 
diffraction  orders  at  P\  becomes  impossible  and  re¬ 
quires  an  increasing  a  unless  M  points  near  r  =  0  are 
omitted  at  Po-  The  a  calculations  in  Eqs.  (5)  and  (6) 
considered  such  issues  but  do  not  readily  allow  one  to 
select  M.  Fortunately,  the  transform  intensity  of  the 
points  near  r  =  0  is  so  weak  due  to  the  r-  attenuation 
factor  in  Eq.  (9)  that  they  can  be  ignored,  and  thus  P\ 
diffraction  orders  of  finite  size  result.  If  we  assume 
that  plane  P\  intensities  for  which  the  weighting  factor 
in  Eq.  (9)  is  <1%  of  the  maximum  can  be  omitted,  we 
find  that  this  corresponds  to  N/M=  lOinEq.  (8).  The 
space  bandwidth  product  NrN,,  at  Pi  is  now  related  to 
that  of  the  input  (N2)  by  (NrN„)  =  7.2 N~2.  In  our 
system,  the  coordinate  transformed  image  at  Pt  is  fed 
into  the  LCTV,  which  has  a  square  resolution  of  120  X 
120.  For  this  output  Pi  space  bandwidth,  the  resolu¬ 
tion  that  our  CGH  can  accommodate  is  ~40  X  40.  The 
choice  of  M  affects  the  amount  of  scale  change  that  the 
system  can  accommodate, '  but  our  N/M  =  10  choice  is 
sufficient  for  a  large  range  of  scale. 

Another  issue  of  potential  concern  is  the  intensity  of 
the  output  from  a  correlator  with  P\  as  an  input. 
When  the  input  image  rotates,  the  transformed  output 
image  is  cyclically  displaced  along  the  vertical  axis  at 
Pi.  In  a  correlator,  this  can  result  in  two  correlation 
peaks  rather  than  one.  The  intensity  of  the  two  peaks 
will  sum  to  the  intensity  of  the  single  autocorrelation 
peak,  and  one  peak  will  always  be  at  least  50%  of  the 
intensity  of  the  autocorrelation  peak.  This  effect  can 
be  avoided  by  synthesizing  a  CGH  and  the  matched 
spatial  filter  to  cover  a  rotation  range  from  0  to 
rather  than  0  to  2tt.  For  the  case  of  a  scaled  input,  the 
P i  pattern  shifts  horizontally  depending  on  the  scale 
factor  and  intensity  of  the  pattern  increases  for  scale 
increases.  A  correlation  output  threshold  set  based  on 
the  minimum  scale  expected  (this  also  affects  the 
choice  of  M)  should  thus  be  used  (or  different  correla¬ 
tion  plane  thresholds  can  be  used  for  different  vertical 
correlation  plane  coordinates).  Alternatively,  from 
the  dc  value  of  the  Fourier  transform  of  the  coordinate 
transform  of  the  input,  an  estimate  of  the  energy  of  the 
object  is  available  and  can  be  used  to  set  an  adaptive 
correlation  threshold. 

III.  Real-Time  Deformation  Invariant  Correlation  Results 

The  CGH  was  tested  in  the  system  of  Fig.  1  with 
various  input  aircraft  and  letter  images.  The  output 
P i  pattern  in  Fig.  1  was  seen  to  remain  unchanged 
(except  for  shifts)  for  rotations  and  scale  changes  in  the 
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Fig.  1.  Real-time  optical  correlator  system  schematic. 


P0  input  images.  The  P(  output  transformed  pattern 
was  found  to  shift  horizontally  by  lna  for  input  scale 
changes  a  and  to  shift  cyclically  vertically  proportional 
to  input  rotations.  This  verified  the  use  of  the  CGH 
for  the  desired  lnr  —  6  coordinate  transform. 

To  perform  deformation-invariant  optical  pattern 
recognition  in  real  time,  a  spatial  light  modulator  such 
as  the  LCTV  is  required  to  record  the  input  P„  pattern 
and  often  also  the  coordinate  transformed  pattern  at 
P i  of  Fig.  1.  If  the  P i  data  are  used  as  a  feature  space, 
the  system  is  modified  slightly12  to  provide  a  shift 
invariant  Pi  output  which  can  then  be  detected  and  fed 
to  a  feature  extractor  and  classifier.  In  this  paper,  we 
concern  ourself  with  the  case  when  the  P\  data  are  fed 
to  the  input  of  a  correlator  (as  shown  in  Fig.  2).  In  this 
case  a  device  such  as  an  LCTV  is  required  to  contain 
the  Pi  data  from  the  system  of  Fig.  1.  We  achieved 
this  by  feeding  the  TV  detected  output  of  the  P,  pat¬ 
tern  of  Fig.  1  to  an  LCTV  at  P\  of  Fig.  2.  The  phase 
errors  of  the  LCTV  are  corrected  for  by  the  phase 
conjugate  hologram  (PCH)  shown."  A  matched  spa¬ 
tial  filter  of  the  coordinate  transformed  object  to  be 
recognized  is  formed  at  Pj  with  the  beam  balance  ratio 
chosen  to  yield  the  optimal  correlation  SNR.  The 
output  correlation  is  produced  at  P:i,  where  it  is  detect¬ 
ed  by  a  camera  and  displayed  on  an  isometric  display. 
The  aperture  at  P<  passes  only  the  first-order  diffract¬ 
ed  pattern  from  Pj.  (Several  diffracted  orders  exist 
due  to  the  regular  pattern  of  pixels  on  the  LCTV.) 
This  removes  the  effect  of  the  fixed  LCTV  pattern  and 
improves  the  SNR  of  the  output  correlation  obtained.' 
The  video  output  from  the  camera  in  Pt  is  amplified 
and  partly  saturated  to  improve  the  output  display 
and  reduce  the  r2  weighting  factor  in  Eq.  (9). 

The  results  of  our  real-time  experiments  on  the  sys¬ 
tems  of  Figs.  1  and  2  demonstrating  scale  and  rotatior 
invariant  pattern  recognition  are  now  discussed.  Fig¬ 
ure  3  demonstrates  rotation  invariance.  Figure  3(a) 
shows  the  original  input  image  used,  the  letter  X,  and 
Fig.  3(b)  shows  the  autocorrelation  of  its  coordinate 
transformed  pattern  with  the  peak  in  the  center  of  the 
Pi  correlation  plane.  The  size  of  the  input  characters 
was  ~50%  of  the  input,  field  of  view  with  an  equivaleni 
resolution  of  ~20  X  20  pixels  in  P,  of  Fig.  1.  Figure 
3(c)  shows  the  lnr  —  6  coordinate  transform  of  Fig.  3(a) 
This  was  used  to  synthesize  the  matched  spatial  filtei 
at  P>  of  Fig.  2.  Figures  3(d)  and  (e)  show  the  isometric 
displays  of  the  P:i  output  correlation  plane  for  30' 
rotations  of  the  input  image  clockwise  [Fig.  3(d)]  anc 
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Fig.  1.  Keal-lime  laboratory  scale  invariant  object  recognition  and 
cross-correlation  data:  la)  input  object.  F'tO'r  scale  change  from 
reference:  t  b)  correlation  of  t  he  coordinate  transformed  input  ot  < a  i. 
(ci  input  object.  70rJ  scale  change  from  reference:  id)  correlation  of 
the  coordinate  transformed  input  of  ic);  <e>  input  object  different 
from  the  reference  object:  correlation  of  the  coordinate  trans 

formed  input  of  ie). 


counterclockwise  [Fig.  3(e)],  respectively.  These  fig¬ 
ures  show  a  large  correlation  peak  whose  shape  and 
peak  value  are  quite  constant.  This  indicates  the  oc¬ 
currence  of  the  reference  object  in  the  input  image. 
The  output  correlations  clearly  demonstrate  that  the 
correlation  peak  is  maintained  under  input  rotations 
and  that  it  is  displaced  up  and  down  proportional  to 
the  rotations  of  the  input  pattern. 

The  scale  invariance  of  our  real-time  system  is  dem¬ 
onstrated  in  Fig.  4.  The  same  original  image  and 
matched  spatial  filter  were  used  [Fig.  3(a)).  Its  coordi¬ 
nate  transformed  pattern  [Fig.  3(c)]  and  autocorrela¬ 
tion  [Fig.  3(b)j  were  shown  earlier.  Scaled  versions  of 
the  reference  input,  as  shown  in  Figs.  4(a)  and  4(c), 
with  scale  factors  of  1.3  and  0.7,  respectively,  were  used 
as  inputs.  Figures  4(b)  and  (d)  show  the  isometric 
displays  of  the  corresponding  output  correlation 
planes.  Note  in  these  figures  that  the  correlation 
peaks  are  still  largely  unchanged  in  shape  and  are  now 
displaced  in  the  horizontal  direction  from  that  of  the 
autocorrelation  in  Fig.  3(b)  proportional  to  the  loga¬ 
rithm  of  the  scale  change  of  the  input.  We  note  also 
that  the  value  of  the  correlation  peak  varies  for  a  scale 
change  of  the  input  as  expected  since  a  larger  pattern 
(containing  more  energy)  results  in  more  energy  in  the 
output  plane.  The  cross  correlation  of  an  unknown 
input  image  [Fig.  4(e)|  with  the  matched  filter  of  the 


coordinate  transformed  reference  yields  negligible 
output  [Fig.  4(f))  as  expected,  since  the  coordinate 
transformation  is  one-to-one  and  thus  does  not  make 
cross-correlation  response  larger. 

IV.  Conclusions 

The  use  of  an  optical  coordinate  transform  (CT) 
system,  employing  a  CGH  and  a  lens,  in  series  with  a 
conventional  optical  correlator  has  been  demonstrated 
in  real  time  for  in-plane  deformation  invariant  pattern 
recognition.  The  CT  system  is  interfaced  to  the  corre¬ 
lator  system  using  a  LCTV  and  TV  camera  to  allow  the 
system  to  process  data  in  real  time. 

In  our  system,  the  CT  chosen  performed  the  polar- 
lnr  transform  which  yields  scale  and  rotation  invari¬ 
ance.  The  CGH  used  to  perform  this  transformation 
was  detailed  with  attention  to  the  recording  technique, 
space  bandwidth  required,  and  effects  of  an  r-  weight¬ 
ing  term.  The  scale  and  rotation  invariant  real-time 
correlation  performance  of  our  system  was  experimen¬ 
tally  demonstrated.  The  results  using  the  inexpensive 
LCTV  are  promising,  and  the  use  of  higher-resolution 
LCTVs  should  yield  improved  correlator  performance 
at  modest  expense. 
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Computer  generated  hologram  recording  using  a 
laser  printer 


Andrew  J.  Lee  and  David  P.  Casasent 


The  use  of  a  laser  printer  for  recording  various  types  of  computer  generated  holograms  is  discussed,  a 
results  are  presented. 


Computer  generated  holograms  (CGHs)  have  a  vari¬ 
ety  of  uses  in  optical  information  processing. 1  Many 
CGH  recording  devices  can  be  used,7  but  few  are 
inexpensive  and  easily  available  to  the  researcher  first 
hand.  Recording  with  a  Calcomp  plotter  and  subse¬ 
quent  photographic  reduction  of  the  pattern  is  the 
most  accessible  form  of  CGH  recorder.  However,  it  is 
limited  in  its  flexibility,  resolution,  and  reproducibili¬ 
ty,  and  it  requires  photographic  reduction  of  large  20- 
X  20-in.-  patterns.  The  advent  of  laser  printers  and 
their  reduced  costs  makes  them  attractive  CGH  re¬ 
corders.  We  emphasize  the  use  of  the  Imagen  300 
laser  printer,1 1  although  the  same  techniques  apply  to 
other  laser  printers. 

The  Imagen  300  is  commonly  used  to  print  letters 
and  other  documents  using  word  processing  software. 
In  this  mode,  the  word  processor  generates  a  file  writ¬ 
ten  in  imPRESS  code.  This  file  is  fed  to  an  image 
processor  (IP)  within  the  Imagen  printer  which  stores 
and  interprets  this  file  and  converts  it  to  a  raster.  This 
raster  format  is  necessary  to  control  sequentially  the 
writing  laser  beam.  The  print  engine  within  the  Ima¬ 
gen  contains  the  laser  and  optics  which  perform  the 
printing  of  the  information  on  paper  as  a  high  resolu¬ 
tion  binary  pattern.  The  imPRESS  commands  typical¬ 
ly  used  define  English  and  Greek  characters,  fonts,  and 
symbols.  To  employ  the  device  for  CGHs,  the  user  can 
employ  imPRESS  to  define  his  own  fonts  by  defining 
glyphs,  tne  basic  cells  used  in  halftone  printing  of  grey¬ 
scale  imagery.  The  user  can  also  employ  imPRESS 
commands  that  draw  points,  lines,  and  arcs  and  per¬ 
form  area  shading.  We  now  detail  two  procedures  we 
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have  developed  to  use  the  Imagen  printer  fc 
synthesis.  We  also  quantify  accuracy  measur 
taken  on  the  printer. 

There  are  a  large  variety  of  CGH  encodin 
niques  possible  to  produce  grey-scale  and  cc 
value  data  with  binary  recording  devices  such 
printers.  To  record  a  2-D  rectangular  array 
with  different  transmittance  at  each  point,  tl 
array  is  specified,  and  halftone  techniques  (usi 
defined  glyphs  or  the  shading  command  in  im 
are  used  to  produce  the  desired  transmittance 
point.  For  spatial  filtering  and  matched  spatia 
ing  applications,  other  encoding  techniques1 
possible  but  follow  from  the  above  basic  technii 
pictorial  example  of  a  halftone  encoded  ima 
duced  by  the  Imagen  printer  is  shown  in  Fi 
demonstrate  the  results  and  concept.  The  ima 
sists  of  190  X  190  glyphs  which  encode  64  differt 
levels. 

For  more  general  CGHs,  the  required  patte 
sists  of  a  set  of  curves,  each  described  by  an  ei 
and  (for  the  case  of  a  binary  pattern)  one  must  < 
all  the  points  satisfying  each  equation  and  prod 
resultant  plot  of  these  curves.  The  imPRESS  co: 
DRAW-PATH  draws  a  curve  through  a  number  of 
A  file  of  all  these  DRAW-PATH  commands  (one  I 
curve)  and  the  absolute  pixel  locations  of  pc 
each  are  then  produced.  This  is  referred  tc 
graphic  imPRESS  file.  This  file  is  then  sent 
Imagen  printer ’where  the  IP  interprets  the  ii 
commands  and  produces  a  raster  file  which  ( 
which  pixels  on  the  page  should  be  turned  on  ( 
black).  The  print  engine  then  produces  the  s] 
pattern.  An  example  of  such  an  output  is  si 
Fig.  2.  This  is  a  continuous  phase  binary  s\ 
CGH  that  implements  a  polar-log  coordinate  ti 
mation  on  a  2-D  input  image.  This  CGH  is  usi 
necessary  for  space-variant  scale  and  rotation 
ant  pattern  recognition.171 

One  issue  in  implementing  these  concepts 
absolute  pixel  positions  must  be  used  rather  tl 
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Fig.  i.  Grey  scale  image  produced  on  the  Imagen  laser  printer  as  an 
example  of  a  CGH  with  halftone  grey  level  spatial  transmittance 


ventional  units  (inches)  of  distance.  This  issue  arises, 
since  when  one  uses  a  CGH  in  an  optical  system,  its 
physical  size  must  be  calculated  and  specified,  and  the 
CGH  must  be  produced  to  exactly  this  size.  Another 
issue  is  that  the  imPRESS  commands  are  written  as  hex 
character  pairs,  and  pixel  locations  are  written  as  four 
hex  pairs.  This  introduces  some  difficulty  in  use  and 
debugging  if  the  user  is  unfamiliar  with  hex  represen¬ 
tation  and  with  the  hex  description  of  all  imPRESS 
commands  (since  to  read  an  imPRESS  file,  all  hex  char¬ 
acters  must  be  converted  to  their  decimal  or  command 
equivalents).  The  first  technique  we  use  to  generate 
the  graphic  imPRESS  file  is  to  write  FOR  I  RAN  subrou¬ 
tines  that  set  U  =  0,  y  =  0)  at  a  given  absolute  pixel 
position  and  then  convert  all  (x,y)  pairs  from  distance 
units  to  pixel  values  (by  dividing  bv  the  300-pixel/in. 
resolution  of  the  Imagen  printer).  The  result  is  an 
(x,y)  sampling  at  300  pixels/in.  We  have  found  this 
technique  to  be  the  most  accurate,  although  it  is  the 
most  difficult  to  use  and  debug.  The  second  tech¬ 
nique  we  use  to  generate  the  imPRESS  file  uses  D1SS- 
Pl.A"’  graphics  software  called  from  a  simple  FORTRAN 
program.  The  points  (x,y)  to  be  connected  are  left  in 
inches  (or  any  distance  unit),  or  as  pixel  indices,  and 
are  then  connected  via  a  series  of  t'ONNPT  commands. 
The  software  then  converts  these  DISSPLA  commands 
into  imPRESS  commands  and  the  (x,y)  points  into  pixel 
indices.  This  technique  is  much  easier  to  use  since  the 
user  need  not  know  all  imPRESS  commands  or  their  hex 
equivalents  and  how  to  convert  from  inches  to  pixel 
indices  to  hex  characters.  However,  each  pixel  on  the 
printed  page  is  not  separately  controlled,  and  pixel 
points  are  not  always  placed  in  an  exact  desired  posi¬ 
tion  (due  to  sampling  and  interpolation  deficiencies  in 
the  disspla  software),  disspla  also  produces  auto¬ 
matic  margins,  thus  eliminating  many  possible  points 
on  the  edge  of  the  page  and  hence  reducing  the  total 
number  of  pixels  one  can  record.  We  now  quantify 
many  of  these  above  remarks.  These  two  techniques 
and  the  Imagen  system  are  shown  in  the  block  diagram 
flow  chart  of  Fig.  3. 

The  pixel  size,  overlap  of  pixels,  and  positional  accu- 


Fig.  2.  Imagen  laser  printer  produced  continuous  phase  b 
synthetic  (T»H  that  achieves  a  polar-log  coordinate  trail 
mation. 


Fig.  Block  diagram  of  the  two  CGH  synthesis  techniques 
the  Imagen  laser  printer. 


racy  of  the  printed  output  are  now  addressed. 
300-pixel/in.  resolution  is  misleading,  since  adja 
pixels  overlap  to  provide  attractive  continuous  chf 
ters.  Measurements  by  us  indicate  that  a  pixel  is  C 
X  0. 175  mm-  and  that  adjacent  pixels  overlap  hori 
tally  and  vertically  by  ~50%  or  0.09  mm.  Thus 
center-to-center  spacing  of  adjacent  pixels  is  C 
mm.  and  each  pixel  is  0.175  mm  in  size  in  one  dit 
sion.  As  a  result,  the  sequence  of  three  pixels  as 
OFF.  on  will  not  show  a  central  OFF  pixel.  Thus 
printer  resolution  is  150  nonoverlapping  pixel 
However,  each  pixel  location  can  be  specified  to 
part  in  300/in.  There  is  a  slight  variation  in  the  w 
of  pixels  due  to  the  varying  density  of  the  toner, 
variation  is  quite  small  [(below  several  micronsl  a 
random  and  could  not  be  measured  with  our  avai! 
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techniques,  even  after  20X  magnification].  The  abso¬ 
lute  positional  reproducibility  of  points  was  tested  by 
writing  alternate  pixels  and  lines  on  the  left  and  right 
side  of  a  page  and  on  the  top  and  bottom  of  a  page.  In 
all  cases,  straight  lines  resulted  that  were  aligned  ex¬ 
actly  to  the  desired  pixel  position.  Thus  the  Imagen 
printer  used  with  the  imPRESS  commands  is  reproduc¬ 
ible  within  the  specified  pixel  resolution  and  within 
excellent  measurement  accuracy  limitations.  To  cali¬ 
brate  distances  to  pixels  and  to  quantify  the  absolute 
positional  accuracy,  the  outline  of  a  square  1200  X  1200 
pixels  in  size  was  recorded  using  the  imPRESS  com¬ 
mands.  The  two  dimensions  of  the  resultant  plot  were 
measured  to  be  equal  within  0.5  mm  (0.02  in.)  or  3 
pixels.  Thus  the  absolute  positioning  accuracy  of  the 
printer  is  3/1200  or  0.25%  over  a  distance  of  4  in.  It  is 
important  to  note  that  the  spatial  size  of  the  square 
CGH  pattern  was  precise  (i.e.,  4  in.  within  0.02  in., 
corresponding  to  1200  pixels)  when  written  directly  by 
the  imPRESS  commands.  When  the  same  4-  X  4-in. 
square  (1200  X  1200  pixels)  was  written  using  D1SSPLA 
software  to  generate  the  imPRESS  file,  the  size  of  the 
square  produced  was  3.76  in.  This  is  due  to  sampling 
and  interpolation  effects  in  the  DISSPLA  software 
(whose  source  code  is  not  available).  This  represents 
no  major  problem,  since  one  simply  scales  the  desired 
dimensions  by  4/3.76  to  obtain  an  exact  pattern  size. 
The  DISSPLA  software  is  still  not  capable  of  controlling 
each  pixel  on  the  final  printed  page  and  in  the  final 
imPKKss  file.  To  demonstrate  this,  we  wrote  a  pattern 
of  two  ON  pixels  separated  by  1/300  in.,  2/300  in.,  etc. 
and  found  the  imPRESS  file  generated  to  have  scaling 
errors  in  the  number  of  OFF  pixels.  Thus,  for  best 
absolute  accuracy  with  separate  direct  control  of  each 
image  pixel,  the  imPRESS  software  is  recommended 
directly.  However,  for  most  CGHs  with  moderate  res¬ 
olution,  the  more  user  friendly  DISSPLA  software  syn¬ 
thesis  technique  will  suffice. 

The  final  topic  of  concern  is  the  number  of  points 
that  one  can  record.  The  IP  within  the  Imagen  printer 
produces  the  necessary  raster  image  from  the  imPRESS 
file.  In  conventional  text  writing,  the  imPRESS  com¬ 
mands  used  do  not  involve  lines  that  extend  more  than 
a  fraction  of  an  inch.  Thus  the  printer  engine  can  (and 
does)  start  printing  before  an  entire  page  raster  file  has 
been  produced  in  the  IP.  In  recording  various  CGHs. 
the  last  command  in  the  imPRESS  file  can  involve 
points  separated  by  a  considerable  distance  (in  the 
extreme  case,  a  command  to  draw  a  line  from  the  top  to 
the  bottom  of  the  page).  Thus,  in  CGH  synthesis,  the 
entire  image  raster  file  must  be  complete  before  print¬ 
ing  begins.  We  achieve  this  with  a  special  software 
command  that  stops  the  print  engine  until  this  condi¬ 


tion  is  satisfied.  For  conventional  text  recordii 
command  is  not  used,  since  it  considerably  redu 
printing  speed  possible.  The  standard  IP  h 
kbytes  of  memory  for  storage  and  processing 
have  added  an  additional  several  megabytes  of  i 
ry  to  this  to  accommodate  high  resolution  lar 
CGH  synthesis.  For  an  8  X  10  =  80-in-  printin 
the  printer  can  support  80  (300)-  =  7.2-M  pixel 
CGHs.  The  need  for  a  large  memory  is  thus  imp 
in  CGH  synthesis. 

CGHs  are  increasing  in  use  and  popularity 
ease  with  which  they  can  be  produced  on  inexp 
and  generally  available  laser  printers  should 
CGH  techniques  to  more  researchers. 
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Error-correction  coding  in  an  associative  processor 


Suzanne  Liebowitz  and  David  Casasent 


A  technique  for  encoding  binary  outputs  from  optical  filters  or  matrix  memories  used  in  an  assoc 
processor  for  object  recognition  is  discussed.  Binary  coded  output  vectors  (rather  than  unit  vectors  I  are 
and  considerably  improve  storage  capacity.  The  output  codes  or  matrix  memories  are  chosen  from  c 
theory  to  enable  error  correction  and  detection.  The  error  classification  rate  for  the  coded  sche 
compared  to  the  noncoded  version  for  different  amounts  of  noise  in  the  input  and  output  planes.  Discc 
of  extensions  to  more  classes,  more  errors,  and  multilevel  coding  are  included. 


I.  Introduction 

We  describe  a  technique  for  using  conventional  cod¬ 
ing  theory  to  enhance  the  capability  of  optical  correla¬ 
tors  for  object  recognition  and  orientation  determina¬ 
tion.  Three  types  of  advanced  filter  that  have  been 
suggested  for  use  in  an  optical  correlator  are  projection 
filters,1  correlation  filters,-  and  peak-to-sidelobe  ratio 
(PSR)  filters.3  Section  II  reviews  the  synthesis  of 
these  filters.  Here,  we  emphasize  the  use  of  projection 
filters  and  especially  their  ability  to  encode  multiple- 
class  information.  Several  methods  for  implementing 
associative  memories  have  been  detailed  in  the  litera¬ 
ture.4'13  Some  of  these  methods  have  been  proposed 
for  optical  implementation.713  In  this  work,  we  syn¬ 
thesize  the  associative  memory  from  projection  filters. 
A  recent  suggested  method  of  optical  associative 
memory  synthesis  used  projection  filters  to  form  a 
matrix  with  each  filter  as  a  column  and  optically  com¬ 
puted  the  vector  inner  products  required  in  parallel.13 
The  output  vector  from  this  memory  is  a  code  that 
describes  the  input  vector.  In  our  work,  the  input 
vector  is  an  object  image-plane  representation,  and  the 
output  code  indicates  the  class  of  the  object.  We  will 
use  the  term  class  loosely,  since  each  input  image  can 
either  be  a  different  object  or  a  different  orientation  of 
one  object  (or  a  combination  of  both).  Each  bit  in  the 
output  code  corresponds  to  the  output  of  one  of  several 
filters  (matrix  columns).  This  associative  memory 
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formulation  is  based  on  a  multifilter  classifies 
technique.  In  Sec.  II  we  review  its  formulation 
realization  and  note  its  improved  storage  capacit; 

In  this  paper  we  further  enhance  this  metho 
designing  the  filters  and  the  output  codes  to  er 
error  detection  and  correction.  For  our  work,  wi 
binary  coding  theory  since  it  allows  for  easier  co 
tions/encodings.  Therefore,  the  outputs  of  the  fi 
can  only  be  set  to  0  or  1  (or  values  representing  0  oi 
for  example,  a  0  output  should  be  avoided).  The  e 
correcting  technique  used  in  our  work  is  present! 
Sec.  III.  Error-correcting  codes  are  advantaged 
situations  where  the  probability  of  a  bit  transitior 
a  binary  code,  this  is  the  probability  that  a  t 
incorrect)  is  small.  In  Sec.  IV  we  discuss  the  i 
model  used  in  our  work  and  the  theoretical  limits 
of  the  coding  techniques.  In  Sec.  V  we  presew 
results  of  simulations  tested  on  both  uncoded 
coded  outputs  with  a  data  base  consisting  of  le 
from  the  alphabet.  By  limiting  the  scheme  to  a  bi 
code,  we  lose  the  ability  to  handle  more  classes 
fewer  filters  as  is  possible  with  multilevel  coding 
Sec.  VI  we  will  discuss  how  multilevel  coding  ca 
used  to  enhance  the  capability  of  the  system  to  ha 
more  classes  with  fewer  filters  and  other  selectee 
vanced  considerations. 

II.  Filter  Synthesis  and  Associative  Memory  Formul 

The  conventional  heteroassociative  memory  fo 
lation  uses  unit  output  vectors  with  the  location  < 
denoting  the  recollection  vector  or  class  assoc 
with  the  input  data.  V arious  associative  memory 
thesis  techniques  and  realization  architectures 
been  described.4  13  We  consider  an  efficient,  1 
capacity  multifilter  associative  memory.  A  si 
conventional  optical  associative  processor  is  sho\ 
Fig.  1.  It  has  an  input  key  vector  x  at  Pj,  whi 
multiplied  by  the  associative  memory  matrix  M 
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Fig.  1.  Optical  parallel  n 
lion  of  multifilter  coding  <a> 
tive  memory  scheme) 


to  give  an  output  recollection  vector  y  -  M  x  at  P.\.  In 
our  aSoOciative  memory  synthesis,  we  use  k  =  F  filters 
ht  as  the  columns  of  M.  The  Pa  output  vector  thus  has 
F  elements,  each  of  which  is  the  vector  inner  product  of 
the  x  input  and  the  various  h*  filters.  F or  the  case  of  F 
=  2  filters  (hi  and  ha)  at  P>  with  binary  thresholded  Pa 
outputs,  the  four  possible  F  =  2-bit  output  vectors  are 
noted  in  Table  I.  Each  of  these  can  be  made  to  corre¬ 
spond  to  a  different  object  class  by  the  appropriate 
output  binary  encoding.  For  the  general  case  of  F 
filters  (P  columns  in  the  matrix  at  P> ),  the  F-bit  output 
can  accommodate  2p  classes  of  objects  (it  is  often  pref¬ 
erable  to  allow  2F  —  1  output  classes  to  avoid  the  all- 
zero  output  vector,  which  can  also  occur  with  no  Pj 
input).  The  associative  memory  matrix  M  need  only 
be  M  X  F,  where  M  is  the  dimension  of  the  input  vector. 
Conventional  associative  memories  require  far  larger 
matrix  sizes  and  would  provide  at  most  recognition  of 
F  rather  than  2f  output  classes  (while  also  requiring  F 
«  M  for  most  conventional  associative  memory  formu¬ 
lations). 

Synthesis  of  this  matrix  (and  its  associated  filter 
vectors  or  columns)  has  been  well-documented1:ua 
and  is  thus  only  briefly  highlighted  here.  We  begin 
with  several  images  in  each  of  several  classes  and  form 
their  vector  inner  product  matrix  V.  We  then  invert  V 
and  multiply  it  by  a  matrix  P  whose  rows  are  the 
desired  F-digit  output  y*  recollection  vector  codes  for 
each  input  key  vector  x*.  The  rows  of  the  resultant 
matrix  A  =  V~'P  specify  each  filter  function  h*  as  a 
linear  combination  of  all  the  original  key  vectors.  The 
F  filters  h*  are  then  used  as  the  columns  of  the  matrix 
M  at  P,  of  Fig.  I  and  the  F-digit  y  output  at  Pi  will  be 
the  binary  code  for  the  2F  different  object  classes  de¬ 
sired  and  specified  by  the  P  matrix.  The  more  general 
version  of  this  associative  memory  synthesis  algorithm 
uses  F  filters  with  L  different  levels  allowed  in  each  of 
the  different  F  output  Pj  digits.  This  allows  us  to 
represent  Lf  object  classes  with  an  associative  memory 
with  only  F  column  vectors.  We  will  restrict  attention 
here  to  the  case  of  binary  output  vectors  (because  the 
error-correcting  techniques  we  will  be  describing  will 
be  much  simpler  for  this  case).  In  this  paper,  we 
consider  techniques  to  improve  the  performance  of 
such  associative  processors  by  using  coding  theory  to 
allow  the  detection  and  correction  of  digit  errors  in  the 
output  y  vector.  We  will  also  restrict  attention  to 
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projection  filters  (with  extensions  to  correlatior 
other  advanced  filters1*  following  directly).  Whe 
key  vectors  are  chosen  properly  (as  statistically  n 
sentative  of  the  data),:i  this  associative  processor 
forms  quite  well  and  the  output  vector  denote; 
reference  recollection  vector  most  closely  assoc 
with  the  x  test  vector.  The  matrix  can  also  be  syi 
sized  to  output  a  reference  key  vector  x^  most  cl 
associated  with  a  partial  or  noisy  input  key  vector 
is  an  autoassociative  memory  matrix).  In  this  p 
we  will  consider  only  a  heteroassociative  memory 
trix,  although  an  autoassociative  memory  matrii 
mulation  as  well  as  the  cascade  of  an  autoassoci 
and  a  heteroassociative  memory  matrix  is  possibh 
yields  excellent  results.  Our  main  attention  wi 
given  to  providing  error-correction  ability  to  this 
ciative  processor  in  addition  to  the  initial  error-co 
tion  ability  the  system  possesses  as  an  associ 
memory  and/or  nearest-neighbor  processor.  Sue 
ditional  error  correction  is  necessary  when  parti; 
put  vectors  are  present,  when  the  dynamic  range  c 
optical  processor  implementing  the  memory  is  lo 
when  input  or  output  noise  is  large.  The  basic  e 
correction  techniques  advanced  should  be  suitabi 
most  associative  processor  synthesis  algorithms 
architectural  realizations. 

III.  Error-Correction  Coding 

The  basic  idea  of  coding  theory  is  to  represent  a 
output  by  n  >  k  bits  in  order  to  allow  for  error  co 
tion.  A  simple  example  of  a  binary  coding  techr 
which  adds  redundant  bits  is  the  parity  bit  schen 
which  one  extra  bit  is  added  to  a  fc-bit  represents 
The  extra  bit  indicates  if  the  number  of  ones  i; 
code  is  odd  or  even.  This  technique  helps  dete> 
rors  but  cannot  correct  them.  F or  our  present  ap| 
tions,  it  is  desirable  to  use  a  coding  scheme  tha 
allow  for  correction.  The  choice  of  coding  schen 
use  for  this  problem  is  exhaustive.  Many  (k  + 
codes  exist  that  can  allow  recognition  of  2k  objects 
various  abilities  to  detect  and  correct  errors.  A  vs 
of  decoding  schemes  also  exist.  The  group  of  cod 
chose  to  investigate  is  the  linear  block  codes.  r 
have  the  ability  to  correct  errors.  The  more  bit  < 
that  a  code  is  able  to  correct,  the  more  redundan 
one  needs  in  the  representation.  Linear  block  cc 
are  described  in  terms  of  generator  matrices  G,  p 
check  matrices  H 1 ,  and  a  syndrome  vector  s.  An 
linear  code  uses  n  bits  to  represent  a  /r -bit  code 
our  binary  case)  2k  different  objects  where  n  >  k 

To  demonstrate  the  concept,  we  specifically  ch 
use  a  (7,4)  Hamming  code.  A  Hamming  code  is 
able  because  it  involves  a  matrix-vector  multiplii 
for  decoding  (and  this  operation  can  be  implem 
digitally  or  optically  using  a  nonlinearity  su 
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Table  II.  Decoding  Table  lor  the  ( n,k )  =  (7.4)  Hamming  Code 


Syndrome 
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Bit  in  error 

Coset  Leader 
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0000000 
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1000000 
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0100000 
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00 10000 
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0000010 

101 

0000001 

thresholding).  A  (7,4)  Hamming  code  uses  7  bits  to 
encode  4-bit  data.  It  crn  thus  accommodate  24  =  16 
different  inputs  or  classes,  and  the  code  has  7  —  4  =  3 
redundant  bits.  This  particular  code  can  detect  and 
correct  1-bit  error  in  the  output.  We  now  provide  a 
brief  review  of  conventional  Hamming  code  theory.14 
The  rc-bit  code  is  derived  by  multiplying  each  possible 
A-bit  message  by  a  k  X  n  matrix  G  known  as  the 
generator  matrix.  The  (7,4)  Hamming  code  is  derived 
by  multiplying  each  4-bit  message  u  (i.e.,  0000,  0001, 
. . .  or  1111)  by 

n  i  o  i  o  o  oi 


1_1  0  J  0  0  0  ]J 

In  coding  theory,  vectors  are  row  vectors  (ur  is  a 
column  vector),  a  matrix-vector  multiplication  is  writ¬ 
ten  as  u  G,  and  all  multiplications  are  modulo  2.  We 
will  retain  this  notation  and  usage.  For  the  message  u 
=  [1101],  the  rc-bit  code  word  would  be  u  G  = 
[0001101].  For  example,  the  first  element  of  u  G  is 

( 1 1 0 1 1 1 1 0 1 1 1 7  =  (1+0  +  0+  1),.  =  2,  =  0.  (21 

The  G  matrix  can  be  written  as  an  augmented  matrix  G 
=  [Pll],  where  I  is  a  k  X  k  identity  matrix  (here  k  =  4) 
and  P  is  a  fe  X  (rc  —  fc)  =  4  X  3  matrix  with  0  and  1  values 
chosen  for  the  specific  code. 

To  decode  a  received  message  r  (of  n  bits)  to  produce 
the  original  )e-bit  message,  we  multiply  r  by  a  parity- 
check  matrix  Hr,  where  H  =  [I„_*  P7].  In  our  exam¬ 
ple,  I  is  n —  k  =  7-4  or  is  a  3  X  3  identity  H  is  3  X 

7,  and  Hr  is  7  X  3.  The  product  r  Hr  yields  a  syn¬ 
drome  vector  of  dimension  n  -  k  =  3  for  our  example. 
Note  that  this  r  Hr  multiplication  is  also  modulo  2. 
The  syndrome  vector  tells  us  if  an  error  has  occurred  in 
transmission  and  which  bit  is  in  error.  If  s  =  O  (the 
zero  vector),  no  error  has  occurred.  A  nonzero  vector  s 
indicates  the  presence  of  an  error  as  well  as  which  of 
the  n  bits  in  the  received  message  is  in  error.  Table  II 
shows  the  eight  possible  3-bit  syndrome  vectors  for  our 
( n,k )  =  (7,4)  code  example,  the  associated  bit  that  is  in 
error,  and  an  n  =  7-bit  unit  vector  e  called  a  coset 
leader.  The  location  of  the  line  indicates  which  bit  in 
the  received  message  is  in  error.  The  corrected  re¬ 
ceived  code  is  obtained  by  adding  e  to  r  modulo  2  (with 
no  carries).  This  correction  operation  can  be  per¬ 
formed  by  a  bit-by-bit  exclusive-OR  of  e  with  r. 

The  relationship  between  e  and  s  is  usually  imple¬ 
mented  in  a  lookup  table.  We  propose  to  achieve  this 


x  M 


Fig.  2.  General  block  diagram  of  an  error-correcting  associa 
processor. 


with  an  associative  memory  to  determine  the  s; 
drome-coset  leader  association.  Since  each  syndro 
vector  s,  corresponds  to  a  coset  leader  e,,  we  will  p 
duce  an  s,  for  each  input  e,  by  a  matrix-vector  multij 
cation  by  a  matrix  Y  that  satisfies  s,Y  =  e,  for  all  (s,. 
pairs.  If  we  place  each  s,  in  the  ith  row  of  a  matri: 
and  each  e,  in  the  it h  row  of  a  matrix  E,  then  Y 
specified  by 

SY  =  E. 

Equation  (3)  can  be  solved  in  several  ways4:  in  a  lea 
squares  sense  as  Y  =  (STS)_1E,  or  iteratively,  or  fr< 
the  outer-product  approximation  (assuming  orthoi 
nal  vectors  s,  such  that  S_1  =  S7) 

Y  =  N  s,7c,. 

Figure  2  shows  the  general  block  diagram  of  c 
proposed  error-correcting  associative  processor.  1 
n  =  7-bit  Hamming-coded  received  vector  r  is  outf 
from  the  first  associative  processor  (Fig.  1  with  the 
matrix  synthesized  using  projection  filters).  It  is  th 
decoded  by  multiplication  with  the  parity-check  n 
trix  H7  .  The  n  —  k  =  3-bit  syndrome  vector  s  p 
duced  is  then  converted  to  the  coset  leader  vector  e 
the  second  associative  processor  shown  (which  has  t 
same  form  as  Fig.  1  with  a  different  P>  matrix).  T 
presence  of  a  bit  error  and  which  bit  (if  any)  is  in  er: 
is  determined  by  e.  The  final  box  then  produces  1 
cnrr+cl.cd  rr -bit  u  code.  The  vector  operations  p 
formed  are  modulo-2  (no  carries)  and,  thus,  the  se] 
rate  operations  cannot  be  combined  conventions 
into  one  matrix-vector  processor.  However,  they  c 
be  combined  into  one  table  lookup  associative  proc 
sor  (Fig.  2).  However,  the  use  of  additional  nonline 
ity  appears  to  be  beneficial  in  such  processors,  and 
employ  the  system  in  the  form  shown  in  Fig.  2. 
emphasize  the  nonlinear  nature  of  the  various  ope 
tions,  we  include  nonlinear  (NL)  units  in  Fig.  1.  Th 
nonlinear  units  also  include  thresholding  operation! 
reduce  noise  effects  and  to  improve  performance. 

Let  us  now  discuss  how  s  and  e  provide  error  corr 
tion.  Three  situations  can  occur  for  the  s  output 
any  Hamming  code.  We  discuss  these  for  our  cas« 
an  (n,k)  =  (7,4)  code: 

(1)  The  received  vector  is  one  of  the  sixteen  allot 
ble  n  =  7-bit  codes  uG  for  the  k  =  4-bit  words  u. 
this  case,  s  and  e  will  be  zero.  The  received  code  wi 
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r  will  be  correct  and  the  final  n-bit  word  u  will  be 
correct. 

(2)  The  received  vector  r  has  a  l-bit  error.  In  this 
case,  s  will  be  one  of  the  2n~k  —  1  =  7  nonzero  syndrome 
vectors  and  e  will  denote  which  bit  is  in  error  (see  Table 
II).  In  this  case,  e  and  r  can  always  correct  the  error  to 
yield  the  correct  u. 

(3)  The  received  vector  r  has  more  than  1-bit  error. 
In  this  case,  the  vector  will  be  (incorrectly)  corrected  to 
one  of  the  sixteen  Hamming  code  words.  This  is  be¬ 
cause  the  Hamming  code  is  designed  such  that  each  of 
the  sixteen  7-bit  received  codes  has  seven  7-bit  re¬ 
ceived  code  words  that  have  1  bit  in  error.  The  seven 
codes  which  are  1-bit  different  are  unique  to  each  of 
the  sixteen  code  words.  Therefore,  the  16  X  7  =  112 
seven-bit  codes  will  always  be  corrected  to  one  of  the 
sixteen  error-free  Hamming  code  words.  Including 
the  original  sixteen  Hamming  code  words,  112  +  16  = 
128  (all  2")  possibilities  for  7-bit  outputs  are  accounted 
for. 

Further  details  on  Hamming  codes  and  other  linear 
block  codes  are  provided  in  many  texts.1417  Other 
coding  schemes  can  allow  the  presence  of  more  than  1- 
bit  error  to  be  detected  and,  therefore,  provide  a  no 
decision  output  state  possibility. 

IV.  Output  Probability  of  Error  and  Noise  Model 

Coding  techniques  perform  well  if  the  probability  p 
of  a  bit  error  is  small.  In  this  section  we  derive  the 
amount  of  noise  that  the  coding  scheme  can  tolerate 
and  still  be  effective.  In  our  specific  Hamming  code 
example,  the  probability  of  error  p  for  any  hit  in  the 
noncoded  4 -bit  representation  is 

P,(<’)  =  1  -tl  -p>4.  (4) 

The  probability  of  error  for  7-bit  Hamming  code  is  the 
probability  that  two  or  more  bit  errors  occur  or 

P,(e)  =  1  -  (1  -  p)'  -  Tp(l  -  p)".  (5) 

which  is  1  minus  the  probability  that  none  or  1-bit 
errors  occur.  If  p  is  small,  then  we  can  use  the  series 
expansion  (1  —  x)'1  to  approximate  (4)  and  (5)  by 

P,(e)  =  1  -  (1  -  4 p)  =  4p,  (6) 

PAe)  =  1  -  (1  -  7 p  +  21p-)  -  7p(l  -  6p)  =  21p". 

(7) 

The  approximation  used  in  (6)  and  (7)  holds  if  Pi(e)  > 
PAe),  i.e.,  if 

p  <  4/21  or  p  <  0.2.  (8) 

Therefore,  for  small  p,  Eq.  (6)  is  greater  than  Eq.  (7) 
and  we  expect  an  advantage  in  using  the  coding  meth¬ 
ods  to  correct  errors.  If  p  is  large  (i.e.,  if  the  noise  is 
large),  coding  may  not  be  beneficial. 

We  now  derive  a  first-order  estimate  of  the  amount 
of  noise  our  system  can  tolerate  and  still  provide  error- 
correction  ability.  We  model  the  noise  fed  to  the 
output  of  the  system  as  a  Gaussian  zero-mean  variable 
n.  The  noise  n  is  generated  and  added  to  the  output  c 


of  the  filter  to  produce  c'  =  n  +  c.  This  is  tl 
thresholded  at  +0.5  and  the  pixels  or  elements  of 
received  signal  become  0  if  c'  <  0.5  and  1  otherw 
This  is  the  output  r  of  our  noisy  system.  The  variai 
<r-  of  the  additive  noise  is  related  to  p  as  we  now  det 
From  (8),  we  require  p  <  0.2  to  satisfy  our  approxir 
tions  in  (6)  and  (7).  We  assume  that  the  probabil 
that  any  bit  is  a  1  (or  a  0)  is  0.5.  Therefore,  if  an  out] 
element  of  the  noiseless  system  is  1,  it  will  become  0 
<  —0.5;  similarly,  if  an  output  element  of  the  noisel 
system  is  0,  it  will  become  1  if  n  >  0.5.  For  Gauss 
noise,  the  probability  of  a  bit  transition  error  is  thi 

p  =  0.5<  '  I  expl -x-/’2n->dx 


+  0.5  (1/2x1’  ;  I  expi  —  x~/2rr)dx . 

We  denote  the  Gaussian  distribution  as 

(Hx)  =  I l/2irl‘  '  j  exp l-r/2ldf.  i 

and  note  that  G(—x)  =  1  —  G(x).  Using  this  symmet 
property,  Eq.  (9)  becomes 

p  -  1  —  (7(0.5/Vx).  ( 

For  this  to  be  <0.2,  we  require  G (0.5/ er)  >  0.8  or 

a  <  0.6  or  o~  <  0.36.  ( 

Therefore,  we  expect  that  when  the  input  noise  ha 
variance  a2  <  0.36,  we  will  obtain  better  results  w 
error-correcting  coding  methods.  In  Sec.  V  we  sh 
the  results  obtained  for  several  values  of  a2. 

V.  Simulation  Results 

The  training  set  or  key  vector  for  our  projecti 
filters  consisted  of  16  images  of  letters  (capitals  A- 
and  small  letters  a-h)  from  the  New  York  Times  fo: 
These  64  X  64  pixel  images  are  shown  in  Fig.  3.  T 
letter  occupies  ~20  X  20  pixels  of  the  entire  ima; 
We  calculated  F  =  4  filters  digitally  off-line  using  t 
algorithm  in  Sec.  II,  with  each  filter  being  of  dimensi 
64-  and  a  linear  combination  of  all  2k  =  2F  =  24  =  16  k 
vector  test  characters.  These  four  filters  were  used 
the  columns  of  the  64-  X  4  matrix  at  Pn  of  Fig.  1.  T 
four  vector  inner  product  outputs  at  Pn  of  Fig.  4  repi 
sent  the  n  =  4-bit  coded  vector  u  (before  error-corn 
tion  encoding)  that  denotes  the  object  class  (the  inp 
letter  and  if  it  is  a  capital  or  lowercase  letter).  T 
sixteen  coded  vectors  u  and  the  letters  to  which  th 
correspond  are  noted  in  Table  III.  These  also  repi 
sent  the  actual  P:f  output  obtained  from  Fig.  1  ( 
simulation)  for  the  case  of  F  =  4  filters  and  L  = 
output  levels. 

We  then  synthesized  a  second  associative  process 
matrix  with  error  correcting  using  the  (n,k)  =  (7 
Hamming  code.  The  associative  matrix  now  consist 
of  n  =  7  filters  of  64-  elements  each  as  the  colur 
vectors  in  the  642  X  7  matrix  at  P>  of  Fig.  1.  Each  oft 
F  =  n  =  7  filters  was  again  a  linear  function  of 
sixteen  original  key  vectors  and  these  filters  were  c; 
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Fig.  8.  64  X  64  pixel  training 
or  key  vector.  Input  images  fj 
The  Sew  York  Times  font  le 
image  is  64  X  64  pixels!. 


Table  III.  Nonerror-Correcting  Output  with  Training  Set  (No  Noise),  F  = 
4.  L  =  2 


Table  IV.  Reference  Key  Vector  Images  and  Associated  Output  C 
Words  from  the  (7,4)  Hamming  Coded  Associative  Processor  with  F 


Letter 

Code  word 

Letter 

C ode  word 

=  2  and  No  Noise 

A 

0000 

a 

1000 

Letter 

C  ode  word 

Letter 

Code  wor 

H 

0001 

b 

1001 

A 

0000000 

a 

1010001 

c 

0010 

C 

1010 

H 

1 101000 

b 

0111001 

[) 

001 1 

d 

1011 

c 

0110100 

l 

1100101 

F. 

0100 

e 

1100 

D 

1011100 

d 

0001 101 

F 

0101 

/ 

1101 

K 

1110010 

e 

m ooo ii 

G 

0110 

k 

1110 

F 

0011010 

f 

100101 1 

H 

0111 

h 

1111 

G 

1000110 

£ 

0010111 

H 

0101110 

1111111 

culated  by  a  straightforward  extension  of  the  method 
outlined  in  Sec.  II.  The  n  =  7  -bit  output  associated 
code  words  (the  chosen  projection  values  used  in  the 
algorithm)  for  the  sixteen  key  vectors  are  given  in 
Table  IV.  There  are  also  the  noiseless  P\  outputs 
obtained  from  Fig.  1. 

In  Table  V  we  present  a  summary  of  our  results  when 
noise  was  added  to  the  output  vector  from  the  associa¬ 
tive  processor.  We  varied  <r-  in  order  to  determine  the 
noise  level  for  which  the  error-correcting  coding 
scheme  is  advantageous,  and  to  determine  the  im¬ 
provement  it  provided  over  a  nonerror-correcting 
code.  For  each  value  of  Table  V  gives  the  percent  of 
the  sixteen  key  vector  images  correctly  classified  for 


Table  V.  Performance  o(  4-Bit  ys  7-blt  Hamming  Code  Assocla 
Processors  lor  Various  Levels  ol  Ihe  Noise  Variance  er'1  (the  Total  h 
_ ol  Images  Is  16) 

No  error  correction  Number 


4 -bit  code 

Hamming  code 

corrected  ei 

IT- 

Noise 

percent  correct 
(number  of  errors) 

percent  correct 
(number  of  errors) 

in  Hamming 
process^ 

0.15 

8i<<  co 

lOCTr  10) 

7 

0.20 

50' t  (8) 

or;  t:u 

9 

0.25 

5<n  (8) 

62*7  (61 

8 

0.40 

88rt  (10! 

">f>r;  t7) 

4 

0.50 

:i8r;  tun 

44r;  (0) 

6 

0.60 

8i'(  an 

;<r;  tin 

■T 

■  •*»  > 

■vi’AiV 

.  A  *  - 

: . . 

1  1  1 

77-  H  |i 

n~  0  h 

•T*  t  U 

Fig.  4.  N 

oisy  A  with  n-  varied 
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the  4-bit  and  7-bit  error-correcting  Hamming  code 
with  the  number  of  errors  in  parenthesis.  Note  that 
any  error  in  one  of  the  4-  bit  outputs  will  be  an  error  and 
that  outputs  with  two  or  more  bit  errors  will  be  errors 
for  the  Hamming  code  associative  processor.  The  last 
column  gives  the  number  of  errors  (out  of  a  maximum 
of  16)  corrected  in  the  7-bit  error-correcting  processor. 

From  the  results  in  Table  V,  the  error-correcting 
coding  provided  better  results  for  values  of  a-  <  0.5. 
However,  the  results  are  significantly  better  for  a2  < 
0.4  (classification  is  81%  for  a-  =  0.2  and  100%  for  a1  = 
0. 15)  and  only  6%  better  than  the  4-bit  scheme  for  a2  = 
0.5.  Thus  for  large  noise  levels,  the  improvement  ob¬ 
tained  by  error-correcting  encoding  is  less  significant 
and  not  necessarily  worth  the  added  memory  storage 
and  calculations.  Significantly  better  performance 
occurs  for  lower  noise  levels  in  agreement  with  the 
theory  in  Sec.  IV.  In  Table  VI  we  list  the  4-  and  7-bit 
outputs  obtained  at  P.\  of  Fig.  1  for  the  case  of  noise 
with  a2  =  0.2.  In  the  output  from  the  4-bit  projection 
filter  associative  processor,  an  *  indicates  an  error.  In 
the  output  from  the  7-bit  Hamming  code  scheme,  an  * 
indicates  an  uncorrectable  error,  i.e.,  2  or  more  bits  in 
error,  and  **  indicates  a  correctable  error. 

In  the  previous  noise  tests,  the  noise  was  added 
directly  to  the  output  recollection  vector  (since  for  this 
case  we  could  obtain  a  theoretical  performance  esti¬ 
mate).  To  determine  the  effect  of  noise  in  the  input 
image  on  the  probability  of  an  incorrect  bit  in  the 
output  plane,  we  require  simulations.  There  is  no 
method  to  directly  calculate  this  relationship  mathe¬ 
matically  since  each  image  and  noise  representation 
will  behave  differently.  To  estimate  the  amount  of 
input  plane  noise  for  which  the  error-correcting  coded 
output  will  provide  a  higher  classification  rate  than  the 
nonerror-correcting  coded  output,  we  varied  the 
amount  of  noise  (measured  by  a1)  added  to  the  input 
training  images.  We  could  then  approximate  the 
probability  p  of  an  incorrect  bit  by  the  number  of 
incorrect  bits  in  the  output  divided  by  the  total  num¬ 
ber  of  bits.  We  use  ten  realizations  of  the  noise  for 
each  o'2  value  to  obtain  better  statistics.  Zero-mean 
Gaussian  noise  (with  a  specified  a1)  is  added  to  each 
pixel  in  the  image  and  the  pixel  is  rethresholded  at  0.5 
to  obtain  the  noisy  binary  input  image.  Figure  4  shows 
sample  versions  of  the  letter  A  with  varying  degrees  of 
noise.  Notice  that  the  additive  zero-mean  noise  clut¬ 
ters  the  background  as  well  as  drops  out  data  from  the 
letter. 

We  performed  ten  runs  for  each  a'2  value  for  all 
sixteen  original  key  image  vectors  for  the  4-bit  and  7- 
bit  output  coded  associative  processor.  For  each  a2 
value,  there  are  sixteen  images,  with  4  and  7  output  bits 
from  the  processor  and  ten  runs.  The  total  number  of 
bits  considered  was  (10  runs)*(4  -I-  7  bits)*  (16  images) 
=  1760.  From  the  number  of  bit  errors  out  of  the  total 
of  1760,  we  estimate  p  for  the  different  &1  values.  The 
results  are  shown  in  column  2  of  Table  VII.  For  input 
noise  variance  of  0.4-1. 0,  the  estimated  value  of  p 
ranges  from  0.02  to  0.13.  Figure  4  shows  how  poor  the 
input  SNR  is  even  with  a-  =  0.6.  The  percentage  of 


Table  VI.  Output  Irom  a2  =  0.2  Output  Noise  Tests 


Letter 

4-Bit  code 
output 

(7,4)  Hamming 
code  output 

Corrected 
Hamming  code 

A 

0010* 

0000001  *  * 

0000000 

H 

0001 

0111000* 

C 

0010 

0110110** 

onoioo 

l) 

0011 

1011001* 

E 

1100* 

1110010 

F 

0001* 

0001011* 

a 

1011* 

1010110** 

1 0001 10 

H 

0111 

0101100** 

0101110 

a 

0000* 

1010011** 

1 0 1  OCX)  1 

b 

1001 

1111001** 

0111001 

c 

1000* 

1101101** 

1100101 

(1 

1001 

0001101 

0 

1100 

0000011** 

0100011 

f 

1100* 

1001011 

g 

1110 

0010011 

h 

1111 

0111111** 

1111111 

Note:  For  the  4- bit  code 

result  the  *  indicates 

error;  for  the  7-b 

(7.4)  Hamming  code  result  the  *  indicates  uncorrectable  error;  the  * 
indicates  correctable  error. 

Table  VII. 

Estimated  Probability  p  of  an  Output  BH  Transition  Error  for 
Input  Images  with  Various  Noise  <r2 

Noise 

variance 

Bit  error 
probability 

4-Bit  output 

Hamming  code 

<T- 

p 

(number  of  errors) 

(number  of  errors) 

0.4 

0.02 

89°;  (181 

9:r<  (101 

0.5 

0.04 

81 '7  (34) 

95rt  18) 

0.6 

0.08 

73'T  144) 

91T  (151 

0.7 

0.10 

65c'<  (45) 

85*7  (20) 

0.8 

0.11 

fir; (59) 

80rr  (32) 

1.0 

0.13 

54T  (73) 

69ri  (50) 

1.3 

0.18 

43'T  (92) 

63°;  (59) 

Sole:  Each  p  estimate  is  based  on  ten  runs.  The  percentage  of  tt 
total  160  images  for  each  n1  value  correctly  classified  for  the  4-b 
Hamming  code  schemes  are  listed  (with  the  number  or  images  mi: 
classified  given  in  parenthesis I. 


the  160  images  (16  characters  X  10  runs)  per  a1  vale 
classified  correctly  and  the  number  of  errors  are  ir 
eluded  in  parenthesis  for  the  two  coding  schemes.  Ft 
the  (7,4)  Hamming  code,  the  percentage  correctly  clai 
sified  includes  those  output  codes  which  originally  ha 
1-bit  error  which  were  corrected  by  the  postprocessin 
In  all  cases,  the  error-correcting  coding  provide 
considerably  improved  classification  rates  and  perfo 
mance  compared  to  the  four-filter  (or  4-bit)  outpu 
As  the  input  noise  variance  increases,  the  classificatic 
rate  is  lowered  for  both  the  error-correcting  and  noi 
correcting  associative  processors.  For  a'2  =  1.3,  v 
estimate  p  at  0.18,  which  is  close  to  the  theoretical  lim 
(of  0.2)  estimated  in  Sec.  IV.  In  this  case,  the  classil 
cation  rate  for  the  Hamming  code  processor  is  on 
63%;  however,  it  is  still  a  significant  improvement  ov 
the  43%  classification  rate  for  the  four-filter  outpi 
As  a'2  is  increased  further,  we  find  the  classificatii 
rate  for  both  the  error-correcting  and  nonerror-cc 
recting  processors  to  be  too  small  to  be  useful,  t 
seen,  a  considerable  amount  of  input  noise  can  1 
tolerated  and  the  error-correcting  associative  proce 
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sor  will  still  perform  well.  In  Sec.  VI  we  discuss  alter¬ 
native  more  advanced  codes  which  are  able  to  correct 
more  bit  errors. 


Vl.  Advanced  Considerations 

From  the  results  presented  in  the  previous  section,  it 
is  evident  that  coding  schemes  can  significantly  im¬ 
prove  classification  results  in  the  presence  of  noise  in 
the  input  and  output.  Two  more  issues  we  will  consid¬ 
er  here  are  (1)  the  handling  of  more  classes  and  (2)  the 
correction  of  more  bit  errors.  We  consider  binary 
output  vectors  initially.  In  the  nonerror-correcting  F- 
filter  case,  we  can  increase  the  number  of  objects  to  be 
recognized  by  increasing  F  =  k,  the  number  of  filters 
and  bits  in  the  output  code.  In  an  F- filter  scheme  with 
F  —  6,  we  can  handle  up  to  2H  different  objects.  This 
would  be  sufficient  to  classify  the  entire  alphabet 
(both  lower  case  and  uppercase)  along  with  the  ten 
numeric  characters  0-9.  If  we  wanted  to  extend  this  to 
an  error-correcting  linear  block  coding  scheme,  we 
would  need  an  ( n,k )  code  with  k  at  least  equal  to  6.  If 
we  also  wish  to  implement  a  coding  scheme  that  is  able 
to  correct  more  than  1-bit  error,  we  will  need  to  synthe¬ 
size  and  store  more  filters  (for  the  extra  redundant 
bits).  Since  a  Hamming  code  can  only  correct  1-bit 
error,  we  must  use  other  available  coding  schemes. 
Binary  Boce,  Chaudhuri,  and  Hocquenghem  (BCH) 
codes  are  one  viable  alternative. ,s- 19  For  example, 
there  exists  a  (15,7)  BCH  code  that  could  handle  128 
classes  and  correct  2-bit  errors,  but  fifteen  projection 
filters  must  be  used.  With  thirty-one  projection  fil¬ 
ters  we  could  implement  a  (31,6)  code  that  could  han¬ 
dle  the  alphabet  and  correct  up  to  7-bit  errors. 

BCH  codes  require  complicated  decoding  tech¬ 
niques.  We  do  not  provide  all  the  details,  but  rather 
will  briefly  outline  the  procedure  in  order  to  compare 
the  difficulty.  With  BCH  codes,  the  syndrome  s  is  still 
calculated  by  a  matrix-vector  multiplication  such  as 
rH7 ,  but  s  is  now  a  1  X  2f  vector  (where  f  is  the  number 
of  bit  errors  we  desire  the  code  to  correct).  The  ele¬ 
ments  s*  of  s  will  now  be  the  sum  of  powers  of  a 
parameter  «,  i.e., 

Sj  =  <i  ’  +  i\  '  + 


+(»’■'+ 


(i:*t 


The  coset  leader  demodulated  vector  for  this  case  has 
as  its  elements  j);  (k  =  1  to  u).  The  values  of  the 
indicate  the  locations  of  the  errors  in  the  original  in¬ 
put.  To  decode  these  output  BCH  s  codes  is  more 
difficult  but  can  be  realized  by  an  iterative  algorithm14 
that  solves  Eq.  ( 13)  for  a,  v,  and  then  all  A-  Since  v<t, 
there  are  multiple  solutions  to  the  set  of  equations  in 
(13),  and  the  solution  that  yields  an  error  pattern  with 
the  smallest  number  of  errors  is  the  correct  solution. 
Furthermore,  other  codes  exist  which  can  correct  a 
number  of  errors  ( t )  and  can  also  detect  if  more  than  t 
errors  have  occurred.  With  such  a  code,  a  vector  out¬ 


put  can  be  classified  as  undecided,  and,  if  desire 
input  can  be  reprocessed  until  no  uncorrectable  i 
occur. 

All  of  our  previous  examples  used  binary  codir 
preferable  coding  scheme  would  employ  multile\ 
ters  with  multilevel  output  coding  vectors.  W 
levels  and  k  bits,  the  output  could  handle  Lk  difl 
objects  instead  of  only  2*  as  with  a  binary  code, 
would  significantly  enhance  the  ability  of  the  syst 
handle  more  information  with  fewer  filters.  On< 
multilevel  code  is  a  nonbinary  version  of  the 
code,  the  Reed-Solomon  code.-0  The  decoding  f( 
more  complicated  than  for  the  BCH  code,  but  it  c 
used. 

We  now  consider  methods  to  reduce  the  size  i 
associative  processor  matrix  (at  P>  of  Fig.  1). 
optical  implementation,  the  number  of  filters  and 
dimensionality  are  restricted  by  the  size  of  the  s] 
light  modulator  on  which  the  matrix  is  recorded 
Using  a  liquid  crystal  TV  (with  127  X  143  pixels) 
we  could  handle  127  filters,  but  each  can  only  b 
long.  If  the  input  key  vectors  are  lexicographic  i 
plane  vectors,  the  input  image  size  is  quite  lii 
(— 10  X  14  pixels).  By  representing  the  inpul 
feature  vector  instead  of  the  full  image,  we  can  si 
cantly  reduce  the  dimensionality,  achieve  shift-ii 
ance  and  some  degree  of  automatic  distortion  ii 
ance.  The  features  chosen  are  dependent  on  the 
of  input  data  and  the  properties  required  of  the  sj 
(such  as  shift,  rotation,  or  translation  invarh 
Typical  feature  spaces  are  Hough  transforms,  Fc 
transform  coefficients,  chord  distributions,  radh 
angular  moments,  and  Fourier-Mellin  coefficier 

The  concepts  presented  here  can  also  be  extenc 
encoding  the  outputs  of  several  correlation  fil 
Correlation  filters  are  implemented  and  used 
differently  from  the  projection  filters.  A  full  co 
tion  of  the  filters  with  the  input  image  is  perfo 
(not  just  an  inner  product).  The  output  correl 
planes  are  then  searched  for  peak  values  above  a  < 
fied  threshold.  These  specify  the  1  or  0  elei 
(peak  or  no  peak)  in  the  output  code.  The  restri< 
on  the  number  of  bits  in  me  code  (or  the  numl 
filters)  depend  on  the  number  of  correlations  thj 
be  performed  in  parallel  or  rapidly  in  series  (recal 
a  full  correlation  must  be  performed  and  the  i 
correlation  plane  searched  to  obtain  1  bit  of  the  o 
code).  This  is  possibly  optically.-0 

VII.  Summary  and  Conclusions 

We  have  discussed  how  to  use  coding  theory  t 
rect  output  errors  from  an  optical  associative  pi 
sor.  The  associative  processor  we  use  employs  p 
tion  filters  for  more  efficient  encoding  of  inform 
Specifically  we  have  demonstrated  the  ability  to  i 
sent  2*  (rather  than  just  k )  object  classes  with  a 
output  recollection  vector  and  a  k  X  m  assoc 
matrix,  where  m  is  the  number  of  elements  in  the 
key  vector.  The  output  code  words  are  select 
enable  correction  of  bit  transition  errors  resulting 
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either  output  or  input  noise.  We  tested  the  ability  of 
the  coding  scheme  to  correct  errors  for  various 
amounts  of  noise  in  the  output  and  input,  and  we 
showed  that  for  small  bit  transition  error  probabilities 
(p  <  0.2),  the  coding  scheme  improved  results.  The 
example  chosen  was  a  sixteen-class  binary  coding 
problem  using  a  (7,4)  Hamming  code  with  the  ability  to 
correct  a  1-bit  error  in  the  output.  Extentions  to 
larger  class  problems  and  to  increased  error-correcting 
capability  were  discussed. 
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I.  Introduction 

Linear  transformations  have  been  used  extensively 
in  the  literature  to  produce  feature  spaces  for  pattern 
recognition.  Transforms  such  as  thp  Fourier  trans¬ 
form.1  Mellin  transform.-  and  Hough  transform  pro¬ 
vide  feature  spaces  for  pattern  recognition.  These 
transformed  spaces  generally  make  object  detection 
and  identification  easier  by  emphasizing  or  bringing 
out  certain  features  of  the  input  image.  These  trans¬ 
forms  can  also  tie  made  invariant  to  certain  types  of 
distortion  of  the  object.  They  also  achieve  a  certain 
amount  of  dimensionality  reduction  so  that  the  num- 
be-  of  samples  required  to  represent  the  input  image 
for  t he  purposes  of  pattern  recognition  is  small.  In 
this  paper,  we  consider  the  Hough  transform  for  spe¬ 
cific  detailed  realization,  although  the  fundamental 
mapping,  transformation,  and  associative  processor 
techniques  are  quite  general. 

There  are  many  reasons  for  considering  the  Hough 
transform  (HTl.  It  is  one  such  feature  space  which 
facilitates  the  detection  of  a  particular  shape.1  It  is 
very  attractive  because  it  can  be  implemented  optical¬ 
ly  in  real  time  and  becauseit  isa  low-level  feature  space 
and  is  thus  quite  unique  for  parallel  optical  realization. 
The  HT  has  been  defined  in  a  variety  of  ways.1  It  was 
originally  formulated  for  the  detection  of  straight  lines 
in  the  input  image.  It  has  also  been  generalized  for  the 
detection  of  other  analytical  curves  (e.g..  ellipses.1  pa¬ 
rabolas  )  and  even  arbitrary  shapes.' 

All  these  transformations  are  linear,  and  a  majority 
ot  the  HT  ones  are  space  invariant'  i.e  .  tin  shape  of 
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the  curve  to  which  each  input  point  maps  is 
regardless  of  the  position  of  the  input.  Thes< 
ties  are  invaluable,  especially  tor  optical  imp 
tioas  oi  these  traiisUirms.  as  vviil  lie  shown  .a 
also  snow  hove  the  straight-hne  Hough  uni. 
iie  used  very  etfectively  tor  pattern  recogniti 
straight -Hue  Hougii  space  lias  several  advani 
can  be  very  easily  computed  optically.  acl 
men.-ioiioiitc  reduction,  and  can  lie  made  inv 
input  object  distortions  by  the  use  ot  cecta 
transformations.  it  can  also  he  used  tv.  die 
non  in  curved  objects.  Digital  methods  n. 
the  linear  transformations  requirt-u  lor  iiic 
slow  til l Cl  computational! >  expensive,  t  spilt, 
ods  to  achieve  the  straighi-hne  HT  exist  anc 
lie  more  practical  and  real-lime,  in  this  p 
advance  an  aiternal  ive  method  to  compute  the 
other  linear  transformations  optically  using 
dative  memory  architecture.  This  approa 
t  remedy  general.  It  can  be  used  to  achiev  e  ge: 
HTs  and.  in  general,  any  linear  i  ransformatio 
transformation  is  also  shift-invariant,  it  can  1 
mented  in  a  very  simple  and  eiegant  manr 
acoustooptic  cells  as  we  vviil  detail  later, 
provide  a  low -level  optical  associative  prove 
tern  nased  on  the  associative  memory  iA.\Ii 
ture.  The  system  produces  the  straight-line 
feature  space  for  recognition  and  location  oi  < 
arbitrary  shapes.  This  is  achieved  oy  the  use 
transformations,  as  we  will  describe. 

Section  li  describes  our  new  associative 
approach  to  a  general  imear  translormatioi 
algorithm  to  out  am  the  required  memory  mat 
turn  ill  discusses  an  associative  processor  lor 
ot  several  linear  Hough  space  translorinati 
attention  to  their  use  ’ -t  object  recognition  ; 
shift -invariant  properl  Section  lY  advanct 
new  acoustooptic  (AO!  architectures  lor  tli 
realization  of  the  proposed  associative  menu 
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tion  IV  describes  a  proposed  low-level  optical  associa¬ 
tive  processor  system  for  object  recognition.  Section 
VI  gives  our  summary  and  conclusions. 


II.  AM  Realization  of  Linear  Transformations 

Heteroassociative  memories  map  each  vector  in  the 
input  to  a  corresponding  vector  in  the  output,  where 
the  input  and  output  vectors  need  not  be  of  the  same 
dimension.  This  fact  can  be  used  to  achieve  linear 
transformations  on  I-D  or  2-0  inputs,  where  each 
point  in  the  input  is  mapped  to  a  corresponding  point 
(or  curve,  a  set  of  points)  in  the  output. 


A.  Vector  Representation  of  Mappings 
The  mappings  to  be  described  apply  to  any  data 
representation  (e.g.,  feature  or  symbolic  space)  but  are 
best  described  for  an  input  image  space.  Let  N  be  the 
total  number  of  pixels  in  an  input  image.  The  2-D 
input  image  can  be  lexicographically  represented  by  a 
vector  with  N  components,  where  each  component 
represents  a  pixel  in  the  input  image.  The  input  vec¬ 
tor  x.  corresponding  to  a  particular  pixel  in  the  input 
image  will  have  all  zeros  in  it  except  in  the  ith  position 
where  it  will  have  a  1.  (For  the  time  being,  we  assume 
that  the  input  image  is  binary,  but  this  assumption  is 
not  required,  as  we  see  later.)  Similarly,  the  output 
associated  with  each  pixel  is  represented  by  a  vector  y, 
of  size  M,  where  M  is  the  total  number  of  pixels  in  the 
output.  Each  output  vector  will  have  nonzero  values 
only  in  those  positions  that  correspond  to  the  set  of 
pixels  or  curve  to  which  the  input  pixel  maps.  The  size 
of  the  output  space  can  be  compressed  to  a  variable 
resolution,  and  thus  M  <  N  is  possible  and  generallv  M 
<  N. 


B.  Construction  of  the  Heteroassociative  Memory 

Let  X  be  a  matrix  with  the  N  input  vectors  Xi, 

x- . x  v  as  its  columns  and  let  Y  be  the  matrix  with 

the  N  corresponding  output  vectors  yi.  y-j . y.v  as  its 

columns.  We  consider  the  pseudoinverse  associative 
memory11  matrix  M  with  the  N  input-output  vector 
pairs  as  the  key  and  recollection  vectors.  In  this  case, 

Y  =  MX.  ( 1 1 

where 

M  = YX  .  Ui 

and  X'  is  the  pseudoinverse  of  X  given  by 

X  »  l.\  Xi  X  <:.I 

Without  loss  of  generality,  we  can  order  the  input 
vectors  so  that  X  is  an  iV  x  .V  ideot  it  v  mat rix  In  1  his 
case,  X  and  X1  are  identity  matrices  and.  therefore. 

.M  V  in 

i.e..  M  is  simplv  the  matrix  of  output  vectors 
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C.  Associative  Memory  Output  for  a  General  Ir 

x 

The  associative  memory  described  above  i 
input  vector  x  to  an  output  vector  v  =  1 
reference  (key)  input  vectors  satisfy  the  pro 

x.ij)  =  '>.. 

where  x,(j)  denotes  the yth  component  of  th< 
ence  vector  x,,  and 

x'x  =  . 

The  output  y  corresponding  to  a  reference  in 
x,  is 

y  =  Mx  =  Yx  =  | y,  .  y, |x 
=  >\.v  1 1 )  +  ...  +  y  vv  i.V) 


where  the  last  line  follows  from  Eq.  (5). 
output  vectors  for  the  N  reference  vectors  a 
the  desired  y,.  We  note  that  the  maximum 
reference  input-output  pairs  we  are  able  i 
equal  to  the  dimensionality  of  the  input  vect 
maximum  is  possible  because  the  input  v 
orthogonal.  In  general,  the  number  of  inp 
pairs  that  can  be  stored  in  an  associative  1 
about  an  order  of  magnitude  smaller  than  t 
sionality  of  the  input  vectors.1  ’ 

For  the  case  of  a  general  input  vector  x  cc 
ing  to  the  full  lexicographic  ordering  of  an 
input  image,  more  than  one  of  its  compone 
nonzero,  and  the  components  can  take  al 
integer  values.  The  output  vector  y  corresf 
this  input  vector  will  be 

V  =  Mx 

=  y ,  .v  <  P  +  ...  +  y  vt  i  .V  i . 

which  is  a  linear  (weighted)  combination  of 
ence  output  vectors  y,.  This  is  exactly  whai 
for  a  linear  transformation.  Therefore, 
transformation  can  be  achieved  through  th< 
associative  memory  approach.  We  now  d 
further. 

D.  Memory  Matrix  for  Linear  Shift-Invariant 
Transformations 

Equation  (4)  gives  a  way  to  construct  tl 
matrix  M  for  an  associative  memory  that  cr 
any  linear  transformation.  Let  us  assum 
transformation  is  shift-invariant  as  well  as 
the  case  of  2-D  images,  this  shift-invarian 
means  that  the  shape  of  the  curve  to  which 
pixel  maps  does  not  change,  and  if  the  posi 
nonzero  input  pixel  is  translated  by  a  cert  a 
the  positions  of  the  nonzero  output  pixels  ai 
ed  by  the  same  amount.  Since  our  input  ■ 
vectors  are  simply  the  lexicographically  oi 
sions  of  the  2-D  image  data,  a  2-D  transla 
image  is  equivalent  to  a  1-D  translation  in 
tors.  This  holds  as  long  as  the  shifted  point 


the  input  field  of  view  and  as  long  as  the  dimension  of 
the  vector  equals  the  dimension  of  the  full  input  image. 
[This  also  holds  when  multiple  objects  are  present  in 
the  2-D  input.  We  map  input  points  to  output  curves, 
and  thus  objects  (viewed  as  a  sum  of  points)  map  to  the 
sum  of  the  output  curves.]  Potential  problems  can 
arise  near  the  boundaries  of  the  2-D  input  image  if  the 
whole  output  curve  does  not  fit  in  the  2-D  output  size 
specified.  This  problem  can  be  overcome  by  slightly 
modifying  the  approach  presented  here.  In  the  inter¬ 
est  of  presenting  the  concept,  we  do  not  concern  our¬ 
selves  with  this  case.  Since  the  input  and  output 
translations  are  equal  for  the  shift-invariant  case,  it 
follows  that  the  input  and  output  vectors  should  be 
equal  in  length  or  M  =  N  for  the  shift-invariant  case. 

Thus,  since  our  reference  input  matrix  X  consists  of 
column  vectors  x;  which  are  just  translated  versions  of 
one  another,  for  the  shift- invariant  case,  the  reference 
output  matrix  Y  also  contains  y,  that  are  translated 
versions  of  one  another.  Specifically,  x,.  is  obtained  by 
vertically  shifting  x,-i  by  one  unit  and  y,  is  obtained 
by  vertically  shifting  y, — 1  by  one  unit.  Therefore,  we 
can  write 


It  follows  from  Eq.  (9)  that  for  the  case  of  linear  shift- 
invariant  transformations,  the  matrix  M  is  lower  trian¬ 
gular  and  Toeplitz. 

E.  Memory  Matrix  for  Quasishift-Invariant  Transformations 

In  this  paper,  our  specific  concern  will  be  w;th  the 
straight-line  HT  for  reasons  explained  in  Sec.  I.  Al¬ 
though  most  of  the  generalized  HTs  are  shift-invari¬ 
ant.  this  is  not  true  of  the  straight-line  HT.  However, 
it  is  shift-invariant  for  certain  translations.  We  refer 
to  this  property  as  quasishift  invariance.  In  the  case 
of  quasishift-invariant  transformations,  the  memory 
matrix  M  =  Y  can  be  partitioned  so  that  Y  = 
(YjYj . . .  |YV],  where  the  column  vectors  in  each  par¬ 
tition  Y,  satisfy  Eq.  (9).  The  corresponding  input 
vector  elements  can  be  similarly  partitioned  so  that  x 

=  [X|.  x . xN .l*.  Thus,  for  the  case  of  quasishift- 

invariant  transformations,  Eq.  (8)  becomes 

y  =  Yx  =  V  X;  +  ...  +  Y  x  v. .  ( 1  < H 

It  is  possible  that  the  y,  terms  satisfying  Eq.  (9)  are  not 
contiguous  in  the  original  (lexicographically  ordered) 
memory  matrix  M.  In  such  a  case,  the  columns  of  M 
have  to  be  reordered,  and  the  elements  of  the  input 
vector  also  have  to  be  reordered  accordingly.  This 
means  that  the  input  image  will  now  have  to  be  ordered 
(or  scanned)  differently  to  make  the  matrix  X  equal  to 
the  identity  matrix.  We  now  illustrate  these  points 
with  examples  of  shift-invariant  and  quasishift-invari¬ 
ant  Hough  transforms  in  the  following  sect  ion. 

III.  Shift-Invariant  and  Quasishift-Invariant  Hough 
Transformations 

We  now  give  examples  of  shift  -invariant  and  quasi 
shift-invariant  Hough  transformations  and  their  asso¬ 


ciative  processor  formulations.  We  firs 
generalized  HT  for  circles  because  it  is  a  $ 
of  a  linear  shift-invariant  transformatii 
concentrate  on  the  straight-line  HT  and 
space  transformations,  since  these  are  out 
cfrn  in  thus  paper 

A.  Generalized  HT  for  Circles 

We  first  consider  the  generalized  HI' 
circles  of  a  given  radius  r.  In  this  case,  eai 
in  the  input  image  is  mapped  to  a  circle  of 
the  location  of  the  center  of  the  circle  be 
tion  of  the  input  point.  In  other  words,  tl 
maps  to  the  curve 

<  .v  -  a  )  -  +  i  \ '  -  w  =  r 

in  the  output  plane.  The  accumuh 
mappings  for  all  input  points  yields  a  peal 
with  coordinates  that  denote  the  center 
If  the  input  point  ( x ,y )  is  translated 
amount,  the  output  circle  is  translated 
amount  (see  Fig.  1).  Therefore,  in  th< 
memory  implementation  of  this  transfo 
columns  of  the  matrix  M  are  shifted  vei 
another,  and  each  column  y,  of  M  discribei 
points  on  the  circle  of  specified  radius  r. 
shift-invariant  transformation.  To  dett 
other  radii,  a  new  M  is  necessary  for  each 
line  HTs  allow  for  an  easier  search  of  circ 
ent  radii  as  we  see  in  Sec.  III.E.  Generali 
similarly  defined  for  other  curves,  but  stra 
realizations  (Secs.  II1.D  and  III.E)  appear 
able,  especially  when  distortions  or  dif 
parameters  must  be  searched. 

B.  Slope- Intercept  Straight-Line  Hough  Tran: 

As  another  example,  we  consider  the  ; 
tercept(c)  parametrization '  of  the  straig 
In  this  case,  each  point  (.v,y)  in  the  inpt 
straight  line  in  the  (m.c)  space  given  by 

\  =  nix  +  i  .  t*r  .  =  m  +  \ . 

This  defines  a  straight  line  w’ith  slope 
intercept  y  in  the  HT  output  (m.c)  space, 
initiation  of  these  mappings  for  all  points 
the  input  gives  rise  to  a  peak  in  the  output 
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parameters  ol‘  the  input  line.  If  the  input  point  is 
translated  to  another  location  (x',y').  the  straight  line 
to  which  it  maps  is  in  general  not  a  simple  translation 
of  the  straight  line  in  Eq.  (12),  since  both  the  slope  and 
intercept  can  change.  Thus  the  mapping  is  not  shift - 
invariant.  If,  however,  we  translate  the  input  point 
along  the  y  axis  only,  the  slope  of  the  new  straight  line 
remains  the  same,  only  the  intercept  changes,  and  the 
mapping  is  a  simple  translation  (in  the  intercept  c  by 
an  amount  y  —  yO  of  the  old  line  given  in  Eq.  (12).  We 
use  this  to  produce  a  quasishift-invariant  transforma¬ 
tion.  We  scan  the  image  vertically  and  note  that  all 
the  y,  terms  corresponding  to  pixels  along  any  vertical 
line  in  the  input  are  shifted  versions  of  one  another  and 
thus  fall  into  one  partition  in  Eq.  (10).  Different 
partitions  are  required  for  each  column  of  the  input. 
Thus,  the  number  of  partitions  is  equal  to  the  number 
of  columns  in  this  case.  However,  when  the  input  is 
scanned  along  vertical  lines,  the  y,  terms  that  satisfy 
Eq.  (9)  are  contiguous,  and  the  mapping  is  easily 
achieved  and  used  in  a  quasishift-invariant  processor. 
In  all  our  transformation  cases,  the  partitioning  of  X 
and  Y  is  such  that  the  number  of  partitions  N,,  in  Eq. 
( 10)  equals  either  the  number  of  rows  or  the  number  of 
columns  in  the  image.  This  may  not  be  the  case  for 
other  transformations. 

C.  Normal  Parametrization  of  Straight-Line  HT 

The  effect  of  input  shifts  on  the  y,  vectors  in  the 
normal  parametrization  of  the  HT  is  considered  next. 
In  this  case,  each  point  U,y)  in  the  input  maps  to  a 
sinusoid  in  a  (6,p)  Hough  space  given  by 

p  =  .v  cos  H  +  v  sin  ti 

=  t.v'  +  v-V  -  cos  |»  -  tan'  !i\  V 1 1  •  <1:0 

Equation  (13)  describes  all  straight  lines  that  could 
pass  through  point  (x.y)  in  terms  of  their  normal  dis¬ 
tance  p  from  the  origin  and  the  angle  0  this  normal 
makes  with  the  positive  x  axis.  The  accumulation  of 
these  mappings  for  all  the  points  on  a  straight  line  in 
the  input  produces  a  peak  in  the  output  HT  space  at 
the  (0,p)  parameters  of  the  line.  In  general,  if  the 
input  point  is  translated  to  a  new  position,  both  the 
amplitude  and  phase  of  the  sinusoid  to  which  it  maps 
change.  Hence  the  output  mapping  is  not  a  simple 
translation.  However,  if  the  input  point  is  translated 
so  that  the  new  point  and  o.  >ginal  point  lie  on  the  same 
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circle  centered  at  the  origin,  the  output 
translation  of  the  sinusoid  given  by  Eq. 
occurs  because  the  sinusoid's  amplitude  (x- 
the  new  point  remains  the  same,  and  onl 
shifts,  as  shown  in  Fig.  2.  With  this  insight, 
if  the  input  image  is  scanned  in  a  polar  f, 
normal  HT  can  be  made  to  be  a  quasishif 
mapping  that  is  shift-invariant  for  shifts 
To  avoid  scanning  the  image  in  a  polar  1 
could  perform  a  simple  rectangular-to-polai 
dinate  transformation  of  the  input  image 
conventional  raster  scan.  The  polar  tran 
verts  the  circular  translation  required  for 
invariance  into  a  linear  translation  in  0.  Th 
data  are  shift-invariant  in  <p  but  not  in  r.  1 
input  image  is  converted  to  a  polar  (r,0)  i 
tion,  a  normal  HT  of  this  polar  data  will  be 
invariant  and  will  have  partitions  of  M  wil 
that  are  shifted  versions  of  one  another.  T 
since  the  transformed  input  points  along  ar 
allel  to  the  0  axis  (i.e.,  points  in  the  origin; 
any  circle  centered  at  the  origin)  will  have 
outputs  in  the  Hough  space  that  are  tran; 
sions  of  one  another.  Each  partition  corres 
row  in  the  (r,0)  representation.  Unfortu 
though  the  retangular-to-polar  transforma 
ear,  it  is  not  shift-invariant.  Thus  the  ; 
memory  shift-invariant  mapping  techniqu 
cuss  cannot  be  used  to  implement  the  polar 
One  could  implement  it  by  computer-genei 
gram  methods  or  by  a  camera  with  a  spec 
Therefore,  although  we  can  theoretically  irr 
polar  coordinate  transform  and  a  normal  st 
HT  using  an  AM  architecture,  we  cannot  us 
pie  and  elegant  architecture  presented  in  t 
The  normal  straight-line  HT  is  nevertheles: 
ful  for  distortion-invariant  pattern  recogni 
plained  in  the  next  section  and  can  be  easilv 
using  other  methods. !l  1,1  Tne  preferable  svs 
HT  for  distortion  invariance  would  thus 
techniques  to  produce  the  HT  and  would  u 
approach  to  do  the  other  transformations  in 
space  that  are  required  for  distortion-invari 
location. 


D.  Hough  Space  Transformations  for  Distortior 

We  now  discuss  some  of  the  transformati 
straight-line  Hough  space  that  can  be  used 
distortion  (scale,  rotation,  and  translatio 
ant.1-  We  consider  the  normal  straight-line 
the  transformations  here  are  easily  desci 
made.  The  normal  straight-line  HT,  as  des 
Eq.  ( 13),  is  not  inv  ariant  to  scale,  rotation,  ar 
tion  changes  of  the  input  object.  However, 
ble  to  perform  transformations  in  the  Houg 
that  the  effects  due  to  these  distortions  are  el 

Similar  transformations  to  those  discussei 
be  derived  for  the  slope-intercept  straight 
but  these  are  much  more  complicated  and 
implemented  in  a  simple  way.  Generalizec 
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also  be  made  distortion  invariant  but  only  for  one  type 
of  object  or  curve.  These  restrictions  do  not  apply  to 
the  normal  straight-line  HT,  as  we  describe  in  what 
follows. 

Let  the  input  object  consist  of  a  set  of  line  segments 
and  let  (0,p)  be  a  point  in  the  Hough  space  correspond¬ 
ing  to  a  line  segment  in  the  input  object  centered  at  the 
origin.  If  the  input  object  is  scaled  by  a  scaling  factor 

it  can  be  shown1-  that  the  line  segment  would  map  to 
a  new  point  ( ft',p ')  given  by 

//  =  sp  and  1/=//.  (14) 

Equation  (14)  defines  a  transformation  that  maps  a 
point  (0,p)  in  the  Hough  space  to  a  point  ( ()',p ')  =  (0,sp) 
in  the  transformed  space.  Equation  (14)  notes  that 
the  transformation  is  the  same  (and  hence  shift-invari- 
ant)  for  each  0.  but  it  is  different  (and  hence  not  shift- 
invariant)  for  each  p.  In  other  words,  the  transforma¬ 
tion  is  shift-invariant  for  translations  along  the  0  axis. 
However,  it  is  not  shift-invariant  for  translations  along 
the  p  axis.  Therefore,  this  transformation  is  quasi¬ 
shift-invariant. 

Similarly,  if  (0,p)  is  a  point  in  the  Hough  space 
corresponding  to  a  line  segment  in  the  input  object 
centered  at  the  origin  and  if  the  input  object  is  rotated 
by  an  angle  4>,  it  can  be  shown1-  that  the  line  segment 
would  now  map  to  a  different  point  (ff'.p')  in  Hough 
space  given  by 

/)'  =  p  and  i  loi 

Equation  (15)  represents  a  transformation  that  can  be 
performed  in  Hough  space  to  search  for  different  input 
object  rotations.  It  represents  a  shift  in  the  Hough 
space  along  the  6  axis.  Since  the  shift  is  independent 
of  the  position  of  the  point,  it  is  a  shift-invariant  trans¬ 
formation. 

Finally,  it  can  also  be  shown1-  that  if  the  input  object 
is  translated  by  (.ri:,y,i),  the  point  (ff.p)  will  now  map  to 
the  point  ( H'.p ')  given  by 

p  -  ~p  —  (  cos  i"  -  (i )  and  H’  =  »  +  rr  if  p  +  t  cos  l"  —  (» I  <  0. 

/•  =/.>  +  /  cos  If*  —  ►  and  W  =  H  if  p  -f  /  cos  o'  -  i  >  0. 

I  Id  I 

where 

f  =  i.v, :  <  =  tan  * t ,  .v , . > .  <  1 7 1 

Equation  ( 16)  represents  a  shift  along  the  p  axis.  The 
shift  is  not  uniform  for  all  points,  but  it  is  the  same  for 
all  points  that  have  the  same  (I  value.  Thus  it  is  a 
quasi-shift -invariant  transformation. 

The  above  transformations  for  rotation  and  shift 
both  require  the  Hough  space  to  be  scanned  in  the 
direction  of  the  p  axis  and  are  shift  invariant  in  p. 
Thus  they  can  be  combined  into  one  quasi-shift-in¬ 
variant  transformation.  By  performing  these  trans¬ 
formations  for  various  values  of  the  distortion  parame¬ 
ters  and  comparing  (matching)  the  resultant  trans¬ 
formed  HT  patterns  with  the  HT  patterns  of  various 
reference  objects,  we  can  identify  the  object  in  the  face 
of  in-plane  distortions  and  also  determine  its  distor¬ 
tion  parameters.  The  associative  memory  architec¬ 


tures  as  detailed  in  Sec.  IV  can  perform  thes 
mations  very  efficiently  and  fast.  (We 
changes  in  scale  as  changes  in  the  curve  p 
and  search  them  by  varying  the  curve  de 
One  measure  of  how  well  two  HT  patterns  m 
point-by-point  product  of  the  two  HTs.  Th 
is  also  the  correlation  value  of  the  two  HT  [ 
the  origin.  Thus  the  matching  can  be  d< 
optical  correlation  architecture.  For  the  a 
1-D  shift  search  of  the  HT  of  the  input  mu 
pared  vs  several  reference  HT  patterns,  a  r 
nel  AO  architecture  is  possible.  Such  a  1  -D 
case,  as  we  have  shown.  If  the  correlation  ' 
such  comparisons  exceeds  a  predetermined 
the  object  is  identified.  However,  comparir 
terns  for  all  possible  distortions  and  class 
object  is  not  a  trivial  task  (even  with  the 
parellelism  of  optics).  Fortunately,  this  pr 
be  overcome  by  treating  the  input  object  as 
arbitrary  shape  and  using  the  procedure  de 
the  next  section. 

E.  Transformations  for  Detecting  Curved  Objec 

The  normal  straight-line  HT  space  can  al 

for  curve  detection.  In  this  case,  we  first  i 
description  of  the  curve  in  terms  of  the  norn 
eters  p  and  0.  Let  this  description  be 

p  =  i . 

where  o , , .  . .  are  the  parameters  of  the  cu 
description  is  a  set  of  peaks  in  a  2-D  norma 
line  HT  of  the  input  curve  after  thresholding 
the  points  below  a  threshold  to  zero  and  ki 
grey-level  values  of  points  above  the  thresl 
detect  a  curve  and  its  parameters  in  an  input 
first  form  the  normal  straight-line  HT  of 
pattern  and  threshold  it.  We  then  perforr 
shift-invariant  linear  transformation  of  tl 
space  given  by 

/>'  =  />-  '/’(<»! . n  ,J*  —  0» 

and  then  an  inverse  Hough  transform.1 1  Th 

<V| . and  «„  used  in  Eq.  (19)  are  the  pan 

the  curve  being  searched  for  and  <i>  its  rotat 
The  presence  of  a  peak  in  the  inverse  HT  spi 
fies  the  object.  The  parameters  used  in  the 
mation  in  Eq.  (19)  (that  yield  a  peak  in  tl 
Hough  space)  identify  the  parameters  of  tl 
curve.  Scale  changes  are  viewed  as  chani 
values  of  the  curve's  parameters  The  j. 
the  peak  in  the  inverse  HT  space  defines  t 
center,1 1  i.e.,  its  shift  (x,„yo).  Thus  this 
allows  us  to*  identify  a  curve's  shape,  its  pt 
and  its  shift  and  rotation.  Use  of  this  techni 
detection  of  missile  trajectories  has  been 
elsewhere.17 

F.  Inverse  Hough  Transform 

As  a  final  example  of  a  quasi-shift-invari 
formation,  we  consider  the  inverse  HT  noi 
For  a  normal  straight-line  HT,  the  inverse 
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Fig..'*.  Kxample  of  the  quasishilt  invariance  of  the  inverse  HT:  (a) 
HT  spare:  (hi  inverse  HT  spare. 


each  point  (t),p)  in  the  Hough  space  to  a  straight  line  in 
the  (x,y)  space  (see  Fig.  3).  It  is  obvious  that  if  the 
input  pixel  at  (8,p)  is  translated  along  the  p  axis,  the 
straight  line  to  which  it  maps  is  merely  translated  in 
the  direction  of  its  perpendicular,  and  its  slope  does 
not  change  (see  Fig.  3).  Thus  this  transformation  is 
shift-invariant  in  the  direction  of  the  p  axis.  If  this 
transformation  is  implemented  using  an  associative 
memory,  it  will  be  quasishift- invariant  if  we  scan  the 
input  HT  along  the  p  axis,  and  the  y,  terms  that  are 
shifted  versions  of  one  another  will  be  contiguous. 

Section  IV  describes  how  the  shift-invariant  and 
quasi-shift-invariant  transformations  can  be  achieved 
optimally  using  acoustooptic  cells.  We  use  the  HT 
transformations  described  in  this  section  for  specific 
case  studies.  Section  V  advances  an  associative  pro¬ 
cessor  system  that  is  capable  of  curved  object  identifi¬ 
cation  and  location.  The  system  uses  the  straight-line 
HT  and  the  Hough  space  transformations  described  in 
this  section. 

IV.  Optical  Realization  of  the  Associative  Processor 

In  this  section,  we  show  how  a  low-level  processor 
based  on  (quasi)shift-invariant  linear  transformations 
can  be  optically  realized  using  acoustooptic  (AO)  cells. 
It  is  a  low-level  processor,  in  the  sense  that  it  operates 
on  raw  image  data  extracting  local  low-level  iconic 
image  features  (e.g..  lines,  edges,  and  their  slopes)  and 
preserves  most  of  the  input  data  information.  We  first 
describe  an  aichitecture  that  can  perform  general  lin¬ 
ear  shift -invariant  transformations.  We  then  de¬ 
scribe  a  different  architecture,  which  is  capable  of 
performing  general  quasishift-invariant  transforma¬ 
tions.  We  would  like  to  restate  that  these  architec¬ 
tures  are  capable  of  realizing  any  general  linear  shift- 
invariant  and  quasishift-invariant  transformations, 
but  we  focus  our  attention  on  the  normal  straight-line 
HT,  because  it  can  be  used  to  recognize  objects  of 
arbitrary  shapes,  and  it  can  be  made  distortion-invari¬ 
ant.  As  noted  in  the  previous  section,  the  transforma¬ 
tions  required  to  achieve  this  are  quasishift-invariant 
and  can  be  easily  achieved  using  the  architecture  sug¬ 
gested  in  this  section.  However,  we  recommend  ob¬ 
taining  the  normal  straight-line  HT  using  the  rotating 
prism  method1"  because  we  need  to  sample  in  input  in 
a  polar  fashion  if  we  want  to  use  the  associative  proces¬ 


V 


Fig.  I.  Optical  rc*;i li/;it i* m  »»f  shift  invariant  traM>iorma 


sor  architecture  suggested  in  this  section  to  ge 
the  HT. 

We  see  from  Eq.  (8)  that  the  output  of  am 
linear  transformation  is  given  by  the  sum  of  the 
ence  output  vectors  weighted  (multiplied)  by  tl 
responding  elements  of  the  input  vector.  If  the 
ence  output  vectors  y.  are  shifted  versions  i 
another,  we  can  acheive  this  linear  transformat 
the  simple  optical  matrix-vector  processor  she 
Fig.  4.  This  architecture  consists  of  a  point  mod 
at  plane  P |.  the  output  of  which  is  expanded  to  i 
nate  uniformly  an  AO  cell  at  plane  P>.  The 
leaving  the  AO  cell  is  then  imaged  onto  a  1-D  de 
array  at  plane  P  ■  which  integrates  in  time.  1 
assume  that  the  AO  cell  can  be  divided  into  M 
lengthwise,  where  M  is  the  number  of  elements 
The  vector  y>  is  first  fed  to  the  AO  cell,  and  tl 
ments  of  the  input  vector  x  are  fed  to  a  point  mi 
tor  at  P\.  Asy.  propagates  downward  in  the  AO 
automatically  creates  y_>.  y:i,  etc.  as  these  are  s 
versions  of  yi-  Thus,  by  pulsing  Pt  with  the  ele 
of  x  at  intervals  of  TJM  (where  T  \  is  the  apertur 
length  of  the  AO  cell)  and  time-integrating  c 
detectors  at  P  over  .V  intervals  each  7',,/A/ ,  we  a< 
the  weighted  sum  of  the  y,  as  required  by  Eq.  (8). 
the  case  of  linear  shift-invariant  transformation: 
M,  as  noted  in  Sec.  II.  D.)  Since  w^e  have  to  load  i 
the  AO  cell  before  we  can  start  the  computatior 
total  time  T  required  to  obtain  the  output  is 

r  =v,4  \  r.  \i  =  1 1  +  a  M>  /  ,, 

In  practice,  the  AO  cell  cannot  he  divided  into  M 
|M  is  the  time-bandwidth  product  (TBWP)  of  tl 
cell],  because  M  is  rather  large  for  most  cases, 
example,  consider  the  case  of  a  generalized  H 
circles  for  a  128  X  128  image.  We  have  .V  = 
16,000.  If  the  AO  cell  can  only  accommodate  a  T 
of  m  (where  m  '<  AT),  we  operate  the  processo 
obtain  m  of  the  M  output  elements  at  the  end  of 
m  s.  We  then  shift  out  the  contents  of  the  deti 
and  repeat  the  process  M/m  times  to  obtain  the 
output.  From  Eq.  (20).  the  total  time  T,  tak 
produce  the  output  on  an  m  element  processor  i: 

'/'  =  i.U  milt  t  A  mi/', 

For  N  =  M  =  1 28  X  1 28.  m  =  500,  and  T ,  =  5  ^s.  w 
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T  =*  5  ms  in  Eq.  (21),  and  the  point  modulator  at  P\  has 
to  be  pulsed  with  the  elements  of  x  at  a  rate  m/T,\  = 
100  MHz.  This  is  a  very  realistic  data  rate  for  the 
point  modulator  and  AO  cell. 

The  above  architecture  realizes  a  shift-invariant  lin¬ 
ear  transformation.  However,  if  we  are  using  the  nor¬ 
mal  straight-line  HT  for  object  recognition,  many  of 
the  transformations  that  we  need  to  perform  in  the 
Hough  space  are  quasi-shift-invariant.  We  now  con¬ 
sider  the  use  of  multichannel  AO  cells  to  achieve  quasi¬ 
shift-invariant  transformations.  The  architecture  we 
consider  is  shown  in  Fig.  5.  Similar  architectures  have 
been  suggested  for  high-accuracy  vector  inner  product 
processors. !S  The  input  plane  P\  consists  of  a  row  of 
N,  point  modulators  where  /V,  is  the  number  of  chan¬ 
nels  in  the  AO  cell.  The  multichannel  AO  cell  is 
placed  at  P<.  Each  channel  consists  of  m  regions 
(TBWP  =  m)  as  in  the  previous  architecture.  The 
light  from  each  point  modulator  is  expanded  to  illumi¬ 
nate  a  corresponding  AO  cell  channel,  and  the  light 
leaving  the  different  channels  is  summed  and  imaged 
onto  a  1  -D  detector  array  as  shown.  Let  the  number  of 
partitions  in  the  memory  matrix  Y  be  /V,,,  as  discussed 
in  Sec.  II.  E,  where  the  y,  terms  in  each  partition  are 
shifted  versions  of  one  another.  Let  us  assume  that  we 
have  an  AO  cell  with  ,Y,  =  A'.,.  We  feed  one  y,  ( the  first 
y,)  of  each  partition  to  one  of  the  N,  different  AO 
channels.  Each  AO  channel  is  assumed  to  have 
TBWP  of  m.  The  input  vector  x  is  also  partitioned 
(and  rearranged  in  some  cases)  so  that  x  = 
(x'i.x'j,.  .  . ,xv  j;  as  detailed  in  Sec.  II. E.  These  x, 
terms  are  time-sequentially  fed  to  the  corresponding 
.V,  point  modulators.  The  system  in  Fig.  5  can  be 
thought  of  as  an  N,  channel  version  of  the  one  in  Fig.  4, 
with  the  N.-  outputs  summed  into  a  common  detector 
array.  The  different  y,  terms  in  different  channels 
produce  the  different  terms  in  Eq.  (10),  as  they  propa¬ 
gate  through  the  different  channels.  Thus  the  whole 
matrix-vector  product  is  achieved  at  the  end  of  ( T\/ 
m  )n  s,  where  n  is  the  maximum  number  of  y,  terms  in 
any  partition  (i.e.,  the  maximum  number  of  shifted 
versions  of  the  y  needed  in  any  partition).  As  in  Eq. 
(21).  we  repeat  this  (M/m)  times  for  M  element  out¬ 
puts  greater  than  the  TBWP  =  m  of  the  AO  cell.  The 
number  of  partitions  can  be  greater  than  the  number  of 
channels.  In  this  case,  we  repeat  I  he  above  procedure 
.V ,/N,  times  to  achieve  the  complete  matrix-vector 
product  in  Eq.  (10).  Therefore,  the  total  time  T 


required  to  complete  the  transformation  is  gi 

7\.  =  I.V./.V,  H.W/m  !( 1  + 

If  the  number  of  y,  terms  in  each  partition  is  t 
n  =  N/N,,.  As  an  example,  we  consider  u 
processor  to  compute  the  inverse  HT.  We 
the  case  when  N  =  72  X  25  (the  size  of  the  HT  s 
=  128  X  128  (the  size  of  the  image  or  inverse  H 
Nj,  =  72  (number  of  partitions,  one  for  each  ft  v 
=  86  (number  of  channels  in  the  AO  cell), 
(TBW’P  of  each  AO  cell),  and  T,\  =  5  /is.  For 
Eq.  (22)  gives  Tj  ~  330  /is.  Therefore,  the  ] 
architecture  is  quite  fast  and  realistic. 

V.  Proposed  Low-Level  Optical  Associative  Pr< 

Figure  6  shows  the  block  diagram  of  the  ] 
low-level  optical  associative  processor.  The 
line  HT  of  the  input  image  is  first  computed, 
be  achieved  in  a  variety  of  ways,  including 
method  presented  in  this  paper.  It  is,  howeve 
able  to  use  the  rotating  prism  method.'"  (Tf 
to  be  the  most  practical  technique,  since  the  A 
od  requires  that  we  scan  the  input  image  ir 
fashion  or  perform  a  rectangular-polar  trai 
tion.)  The  HT  obtained  is  operated  on  by  an 
tive  processor  (performing  quasishift-invaria 
formations)  to  determine  the  curve  parame 
the  rotation  value  for  the  curve.  The  opera 
quired  on  the  HT  are  linear  and  (quasi)shift-ii 
as  explained  in  Sec.  III.  Hence  the  architect 
gested  in  Fig.  5  can  be  used  to  perform  thes 
tions.  The  same  architecture  is  then  used  to 
the  inverse  HT  to  provide  the  translation  pa 
of  the  object.  Thus  this  processor  can  be  reali 
one  HT  unit  and  two  AO  cell  AM  units  of 
shown  in  Fig.  5. 

Some  advantages  of  using  this  technique  a 
below.  We  use  the  normal  straight-line  HT 
objects  of  all  shapes  and  thus  avoid  the  use  of 
generalized  'transforms  for  objects  of  differen 
The  transformed  spaces  are  always  2-D,  whi< 
simpler  and  more  efficient  use  of  memory.  T 
od  also  works  for  multiple  objects  and  partia 
Other  associative  memory  techniques  first  u; 
toassociative  memory  to  map  partial  object 
objects  and  then  a  heteroassociative  memory 
object  identification.  Since  our  method  work 
tial  objects,  we  do  not  need  the  autoassociativ 
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ry.  The  use  of  different  transformations  in  the  Hough 
space  and  an  inverse  transform  to  achieve  the  object 
identification  and  location  in  our  associative  memory 
mapping  is  more  efficient  than  a  more  conventional 
autoassociative  memory  followed  by  a  conventional 
heteroassociative  memory  that  maps  every  possible 
distorted  version  of  the  various  objects  to  appropriate 
output  vectors. 

'.  .jmmary  and  Conclusions 

In  this  paper,  a  new  approach  to  achieving  linear 
transformations  on  2-D  images  using  associative  mem¬ 
ories  has  been  suggested.  We  have  shown  how  general 
linear  transformations  can  be  viewed  as  associative 
memories.  We  have  detailed  the  different  linear 
transformation  operations  required  for  the  case  of  an 
HT  feature  space  for  pattern  recognition  and  how  each 
can  be  achieved  by  an  AM  processor.  These  include 
the  Hough  transform,  the  HT  space  transformations, 
and  an  inverse  Hough  transform.  The  construction  of 
the  memory  matrix  required  for  each  associative  mem¬ 
ory  processor  has  been  detailed,  and  an  architecture 
for  its  optical  realization  has  been  suggested.  The 
architecture  is  simple,  elegant,  and  capable  of  real¬ 
time  processing  for  shift-invariant  as  well  as  quasi - 
shift-invariant  linear  transformations.  We  have  thus 
suggested  a  low-level  associative  processor  that  uses 
linear  transformations  for  the  recognition  and  location 
of  curved  objects.  The  processor  can  be  implemented 
optically. 


The  support  of  this  research  by  a  grant  from  the  Air 
Force  Office  of  Scientific  Research  (grant  AFOSR-84- 
0293)  as  well  as  partial  support  from  the  Office  of 
Naval  Research  is  gratefully  acknowledged.  Fruitful 
discussions  with  Bradley  Taylor  are  also  acknowl¬ 
edged. 


References 

I  I ).  ( ’asasent  and  V.  Sharma,  "Kourior  I  ranslonn  Feat  ur« 
Studio."  Pmc.  Sue.  Photo-Opt.  lustrum.  Kng.  149,  f  19* 

2.  ]).  ( 'asasent  and  I).  PsaJtis.  "Now  Optical  Transforms  ! 
torn  Recognition.”  Prut  .  1KKK  65,  77  il!J77i. 

P.  V.  (\  Hough.  “Method  and  Means  lur  Recognizing  C 
Patterns.*’  C.S.  Patent  .'LO(>9.(>54  (1962). 

4.  R.  ().  Duda  and  P.  K.  Hart.  "Cse  of  the  Hough  Transl 
Heteet  lanes  and  (‘tirves  in  Pictures.”  d.  Assoc.  (’uinpul 
la,  1  1  (19721. 

f>.  1).  H.  Hal  lard  and  (’.  M.  Drown.  Cnmputrr  \'i»inn  (Pi 
Hall.  Knglewood  (  lilts.  Nd.  19*21. 

(>.  S.  Tsuji  and  F.  Matsuniotu.  “Detection  o!' 'Kllipse>  by  a  M 
Hough  Transformation.”  IKKK  Frans,  (’tmiput.  Com- 
( I97S). 

7.  H.  Wechsler  and  d.  Sklansky,  “Finding  the  Rib  (  age  ii 
Radiographs."  Pattern  Recognition.  9,  21  (1977). 

s.  1).  H.  Mallard,  "(ieneralizing  the  Hough  Transform  to 
Arbitrary  Shapes.”  Pattern  Recognition,  13,  lilt  19*1 1. 

9.  ( i .  Kichman  and  R.  Hong.  “Coherent  Optical  Prod  in 
the  Hough  Transform.'  Appl.  ( )pt.  22,  *30  ( 19*3). 

R).  (».  R.  Hindi  and  A.  F.  (imitro.  “0[)tical  Feature  Kxtrac 
the  Randon  Transform.”  Opt.  Kng.  22,  499  ( 19*4). 

11.  P.  Amhs,  S.  H.  Lee.  Q.  Tian.  and  Y.  Fainman.  “Optical 
mental  e -u  ot  the  Hough  Transform  by  a  Matrix  of  H«»loj 
Appl.  Opt.  25,  4029  (  19*6). 

12.  R.  Krishnapuram  and  H.  Casasent.  “Hough  Space  'I  ran: 
lions  lor  Discrimination  and  Distortion  Kstimation.”  C 
Vision  ( Iraphics  image  Process.  2H,  299  1 19*7). 

12  D.  ( ’asasent  and  R.  Krishnapuram.  “Curved  Object  Duct 
Hough  Transformations  and  Inversions.”  Pattern  Recoj 
20,  No  1  SI  0  9*7). 

11  ’1'.  Kohonen.  Self  (traanizu! i<>ti  and  A^sncint  irr  A 
i  Springer- Yerlag.  New  York.  19*41. 

IV  (i.  S.  St  iles  and  D  Dentp  “On  t  he  Kffect  of 'Noise  on  t  he 
Penrose  Oenerali/ed  Inverse  Associative  Memory.’ 
Trans,  Pattern  Anal.  Machine  Intel!,  PAM1-7,  MAS  1 19* 

R'>.  D.  Casasent  and  M.  Kraus.  “A  Polar  Camera  tor  Space- 
Pattern  Recognition.”  Appl.  Opt.  17,  1559(197*). 

17.  I )  <  asasent  and  R.  Krishnapuram.  “Detection  ofTarget 
lories  1  sing  the  Hough  Transform.”  Appl.  Opt.  26,  247 

I  *  D  Casasent  and  14.  K  Taylor,  “Handed  Matrix  High 
mance  Algorithm  and  Architecture.”  Appl.  Opt.  2  1,  1470 


i 


3648 


APPLIED  OPTICS  /  Vol  26.  No  17  /  1  September  1987 


AFOSR-81-0293,  Annual  Report 

9.  STORAGE  CAPACITY  AND  DECISION 
MAKING  ASPECTS  OF  OPTICAL 
ASSOCIATIVE  PROCESSORS 


Vet.  646 
‘..■vcnbc.'L  !9£7 


Associative  Memory  Synthesis,  Performance,  Storage  Capacity 
and  Updating:  New  Heteroassociative  Memory  Results 

David  Casasent  and  Brian  Telfer 
Carnegie  Mellon  University 
Center  for  Excellence  in  Optical  Data  Pr -cessing 
Department  of  Electrical  and  Computer  Engineering 
Pittsburgh,  PA  15213 


ABSTRACT 

The  storage  capacity,  noise  performance,  and  synthesis  of  associative  memories  for  image  analysis 
are  considered  Associative  memory  synthesis  is  shown  to  be  very  similar  to  that  of  imear 
discriminant  functions  used  in  pattern  recognition  These  lead  to  new  associative  memories  and  new 
associative  memory  synthesis  and  recollection  vector  encodings  Heteroassociative  memories  are 
emphasized  in  this  paper,  rather  than  autoassociative  memories,  since  heteroassociative  memories 
provide  scene  analysis  decisions,  rather  than  merely  enhanced  output  images  The  analysis  of 
heteroassociative  memories  has  been  given  liiile  attention  Heteroassociative  memory  performance 
and  storage  capacity  are  shown  to  be  quite  different  from  those  of  autoassociative  memories,  with, 
much  more  dependence  on  the  recollection  vectors  used  and  less  dependence  on  M  N  This  allows 
several  different  and  preferable  synthesis  techniques  to  be  considered  for  associative  memories  These 
new  associative  memory  synthesis  techniques  and  new  techniques  to  update  associative  memories  are 
included  We  also  introduce  a  new  S.\R  performance  measure  that  is  preferable  to  conventional  noise 
standard  deviation  ratios 

1.  INTRODUCTION 

Much  has  been  written  about  associative  memory  storage  capacity  and  the  recollection  and  error 
correction  properties  of  such  memories  Section  2  reviews  associative  memory  synthesis,  several  of  the 
neural  and  other  associative  memory  models  suggested,  and  advances  initial  remarks  on  the  storage 
capacity  of  associative  memories  The  similarity  of  associative  memory  matrix  rows  to  pattern 
recognition  linear  discriminant  functions  (LDFs)  is  included  The  assumptions  on  the  key  vectors  in 
the  different  associative  memories  are  also  noted,  since  this  is  not  generally  given  proper  attention 
As  we  shall  see,  most  work  has  considered  autoassociative  memories  (A AMs)  In  Section  3,  we  derive 
expressions  to  show-  that  heteroassociative  memory  (HAM)  performance  and  storage  capacity  are  quite 
different  from  those  of  A. AM s  We  also  advance  new  and  preferable' performance  measures  to  be 
employed  in  comparing  such  memories  Quantitative  supporting  data  on  HAM  and  A  AM 
comparative  noise  performance  and  storage  capacity  are  then  advanced  in  Section  -4  We  conclude 
(Section  5)  with  initial  remarks  on  different  associative  memory  synthesis  techniques  to  provide 
updating  and  altering  of  associative  memories.  Our  work  and  attention  to  H.AMs  is  especially 
important  in  image  analysis,  image  understanding  and  pattern  recognition,  rather  than  image 
reconstruction  and  image  enhancement  as  is  generally  the  AAM  case  considered 


2.  SYNTHESIS  A-ND  STORAGE 


2.1  TERMINOLOGY  AND  PSEUDOINVERSE  ASSOCIATIVE  MEMORIES 

In  our  notation,  the  input  key  vectors  x^  are  of  dimension  N,  the  output  recollection  vectors 
are  of  dimension  K  there  are  M  key/recollection  vector  pairs  and  the  associative  memory  matrix  M  is 
K  X  N.  An  associative  memory  is  intended  to  output  a  recollection  vector  that  is  closest  to  or 
most  closely  associated  with  a  given  input  key  vector  x^,  i  e.  we  desire  M  x^  —  f°r  all  k  =  1  to 
M.  If  we  form  the  key  vector  matrix  X  of  size  N  X  M  (with  the  x^  as  its  columns)  and  the 
recollection  vector  matrix  Y  of  size  K  X  M  (with  the  y^  vectors  as  its  columns),  the  associative 
memory  must  satisfy  M  X  =  Y  If  X  is  square  and  non-singular,  the  solution  to  this  can  be  written 
as 


M-VI1  (1) 

Generally  X  is  not  square  and  this  solution  is  not  of  practical  use  The  typical  solution  used  is 

M  =  YX+,  (2) 

where  the  pseudoinverse  of  X  is 

x+  =  (xTxr‘xT  (3) 

and  where  the  vector  inner  product  (VIP)  matrix  is 

V  =  XTX  (4) 


The  data  matrix  is  denoted  as  X  (it  has  the  x^  key  vectors  as  its  row  vectors)  We  note  that  when 

the  x^  vectors  are  orthonormal,  then  V*  =  I  and  X  =  X  The  solution  in  Eq.(2),  with  X  given 

T 

by  Eq.(3),  is  useful  since  X  X  is  a  square  matrix  and  hence  it  has  an  inverse  (if  the  x^,  are  linearly 
independent,  in  which  case  V  is  of  full  rank)  Thus,  this  solution  in  Eq.(3)  is  only  possible  when  the 
are  linearly  independent.  In  other  cases,  X+  must  be  calculated  using  singular  value  decomposition 
and  other  advanced  techniques,  which  first  produce  a  set  of  orthogonal  vectors,  or  which  form 
separate  linear  discriminant  functions  (each  of  which  is  a  row  of  the  associative  memory  matrix)  The 
pseudoinverse  solution  is  an  exact  solution  if  the  x^  are  linearly  independent  (and  in  this  case  the 

simple  X+  solution  noted  in  Eq.(3)  can  be  used).  This  pseudoinverse  solution  in  Eq.(2)  is  the 
minimum  mean  square  error  (MSE)  solution  that  minimizes  |j Y-MX[|T  In  cases  when  Eq  (3)  can  be 
used,  )|Y-MX||2  =  0.  If  the  x^  are  orthonormal,  then  X+  =  X^  and  calculation  of  the  memory 
matrix  M  is  trivial  When  M  <  N,  there  are  more  unknowns  than  equations,  and  an  infinite  number 
of  solutions  exist  (the  underdetermined  problem)  and  Eq  (2)  is  one  of  these  solutions  This 
pseudoinverse  solution  is  the  minimum  norm  solution  1 1 7 j  to  M  X  =■  Y,  i.e  it  is  the  solution  whose 
outputs  y^  are  the  least  effected  by  input  perturbations 


2 


The  associative  memory  described  above  is  a  HAM  The  typical  associative  memory  discussed  is 
tbe  AAM.  In  this  memory,  the  prior  discussion  is  still  valid  with  V  =  X  and  M  =  X  X  (thus,  the 
AAM  is  a  special  case  of  the  HAM)  We  feel  that  more  attention  should  and  must  be  given  to  HAMs 
Kohonen  (lj  discusses  X  X  +  as  the  orthogonal  projection  operator,  where  the  output  vector  y_ 
produced  is  a  linear  combination  of  the  key  vectors  with  minimum  MSE  for  the  case  of  an  AAM 

The  AAMs  and  HAMs  described  above  are  the  most  common  associative  memories  discussed  1 
The  use  of  the  data  matrix  X1  as  an  associative  memory  has  also  been  suggested  and  shown  to  be  a 
preferable  nearest  neighbor  associative  memory  for  binary  [2j  and  gray  scale  key  vectors  The 

technique  by  which  tbe  associative  memory  is  formed  can  be  used  to  distinguish  different  associative 
memory  systems.  In  one  model  [4,5,6;,  the  memory  is  formed  from  data  matrices  of  the  key  and 
recollection  vectors  in  a  VIP  processor.  The  most  common  synthesis  technique  discussed  forms  the 
matrix  as  the  sum  of  the  vector  outer  products  of  key  and  recollection  vector  pairs  :1  Some  specific 
associative  memories  [8]  restrict  the  key  vector  elements  to  be  0  or  +_1  In  synthesis,  they  sum  the 
vector  outer  product  (VOP)  of  each  vector  pair  and  quantize  the  final  matrix  to  0  or  ml  In  other 
cases,  the  diagonal  elements  of  the  memory  matrix  are  set  to  0  (usually  to  model  neural  networks)  In 
some  memories,  recollection  occurs  after  one  matrix-vector  multiplication  In  other  cases,  the  output 
from  each  matrix-vector  multiplication  is  thresholded  and  fed  back  to  the  input  of  the  system,  and 
the  final  recollection  output  is  obtained  only  after  several  iterations  In  one  of  the  most  popular 
associative  memories,  the  Hopfield  memory  [9,10),  the  key  and  recollection  vectors  are  bipolar  binary 
and  the  diagonal  elements  of  the  matrix  are  0  Some  associative  memories  require  sparse  key  vectors 
for  efficient  recall  Most  associative  memories  are  synthesized  as  matrix- vector  processors  However, 
analogous  holographic  associative  memory  synthesis  techniques  also  exist  [11,12,13 

Thus,  there  are  a  large  variety  of  associative  memories  We  consider  HAMs  and  gray-level 
memories  and  key  vectors  Our  genera)  preference  in  image  analysis  is  to  use  input  key  vectors 
that  have  no  unrealistic  constraints  (such  as  linear  independence,  orthogonality,  etc  ).  In  a  subsequent 
paper,  we  detail  techniques  to  achieve  this  aDd  provide  examples  of  ways  to  achieve  the  more 
important  property  of  shift  invariance  in  associative  processors  intended  for  image  processing 


2.2  KEY  VECTOR  REQUIREMENTS 

Generally,  key  vector  image  inputs  cannot  be  assumed  to  be  linearly  independent,  and  thus  the 
practical  use  of  associative  memories  for  such  image  data  is  of  concern.  In  some  cases,  linear 
independence  may  occur,  of  course,  but  this  cannot  be  guaranteed  If  the  x^  are  image  domain 
vectors  (i.e.  lexicographically  ordered  images),  and  if  M  <£  N,  then  often  we  will  find  that  the  x^  are 
independent,  or  at  least  there  is  a  reasonable  assurance  that  this  will  occur  However,  we  note  that 
there  is  no  guarantee  of  this.  If  the  x^  are  feature  vectors,  then  generally  M  >  N  and  the  key  vectors 
are  linearly  dependent.  For  the  more  practical  and  genera!  case  of  linearly  dependent  key  vectors,  one 
can  employ  singular  value  decomposition  [14].  This  algorithm  produces  orthogonal  vectors  and  for 
the  case  of  linearly  independent  key  vectors  it  addresses  practical  numerical  stability  issues  associated 
with  calculations  of  the  inverse  of  V.  This  merits  attention,  since  the  condition  number  of  V  is  the 
square  of  that  of  the  matrix  X  The  problem  with  the  SV’D  technique  is  its  high  numerical 
computational  load,  which  precludes  its  use  in  real  time  and  its  use  for  updating  associative  memory 


matrices  A  modified  Karhunen-Loeve  approximation  to  X'"  developed  for  image  ;  :i.a:r,  synthetir 
discriminant  functions  is  quite  useful  here  also  15'  It  allow,  s  operation  on  h:r'  -  Arr  t,?!-  '.al.'y 
linearly  dependent  ke'  vectors.  The  technique  used  is  to  calculate  the  eigenvectors  of  the  correlation 
matrix  from  the  much  smaller  dimensionality  VIP  matrix.  V\  e  do  this  for  the  key  vectors  for  each 

class  \\'e  retain  only  several  (typically  3)  eigenvectors  per  class  We  then  orlhogonant’.e  the 

eigenvectors  from  all  classes  (using  Gram-Schmidt  (GS)  or  related  techniques1  All  of  these 
calculations  are  performed  in  the  reduced  VIP  space,  hence  allowing  real  time  calculations  The 
memory  can  then  be  easily  described  in  terms  of  the  original  higher  dimensionality  image  space 
These  final  eigenvectors  are  then  used  as  the  rows  of  the  associative  memory  matrix  We  refer  to  this 
as  the  Y1P-GS  associative  memory  synthesis  technique  [3]. 

The  direct  synthesis  of  an  associative  memory  as  the  sum  of  vector  outer  products  of  each 

key/recoiiection  vector  pair  requires  orthonormal  key  vectors  (and  will  not  yield  correct  results  even 
for  linear  independent  key  vectors,  since  —  X  X  =  X(X  X)  X  ~  X  X  only  for 

orthonormal  vectors).  Similarly,  the  simple  VIP  synthesis  of  an  associative  memory  also  requires 
orthonroma.  key  vectors.  However,  when  a  nonlinearity  is  used  at  the  intermediate  plane  4  ,  where 
the  product  of  the  input  vector  and  the  data  matrix  is  formed,  the  requirement  of  orthcnorma!  key 
vectors  can  be  reduced  However,  if  the  key  vectors  are  only  restricted  to  be  linearly  independent, 
this  method  will  stili  not  achieve  proper  results  The  V1P-GS  synthesis  technique  and  the  iterative 
Widrow-Hoff  are  two  very  attractive  and  real  time  techniques  for  associative  memory  synthesis  in  the 
practical  case  of  linearly  dependent  key  vectors 


2.3  ANALOGY  WITH  PATTERN  RECOGNITION  LINEAR  DISCRIMINANT 
FUNCTIONS  (LDFa)  .  ~ .  ~ 


We  now  discuss  how  the  different  solutions  to  M  X  =  V  are  related  to  different  pattern 
recognition  LDFs  For  linearly  independent  key  vectors,  the  pseudoinverse  solution  is  related  tc 
various  synthetic  discriminant  functions  (SDFs)  [l5j  for  distortion-invariant  pattern  recognition,  i  e 
the  outputs  from  the  pattern  recognition  system  are  analogous  to  the  recollection  vectors  y_^  in 
associative  memories  and  the  key  vector  input  images  x^  are  analogous  to  the  images  to  be  classified 
independent  of  distortions,  etc.  To  see  this,  we  consider  the  filter  function  (or  associative  memory 
vector)  to  be  a  linear  combination  of  several  key  vectors,  i  e 


ll  =  “aj£j  —  X  a. 


(5) 


where  X  has  the  training  images  or  key  vectors  jo  as  its  columns  and  the  vector  a  has  as  it 


its  elements 


•  1 


to  V  a 


the  coefficients  a  that  describe  the  filter  function  h  This  filter  is  the  solution  a  =  V) 

T 

where  V  — -  XJX  is  the  \TP  matrix  and  u  is  the  vector  of  desired  outputs  whose  bit  code  denotes  the 
class  of  the  input  key  vector  x  under  test  The  filter  function,  when  written  as  a  row  vector  is  thus 
the  following  solution 


hT  =  u^xr’x7  =  uTX- 


(6) 


This  solution  is  the  Mime  as  Kq  \'2),  vi  here  c;ich  row  m  the  ps^udoiri'  erst-  >ry  is  a  giver:  h  f : .  t  *  r 

1' 

with  the  corresponding  row  of  Y  given  by  the  row  vector  u  1  output  eticoflu.g  The  use  of  K  mull:;  > 
SDFs  (hj  to  hj^)  with  different  output  codings  u^  or  the  analogous  associative  memory  can  thus  t< 
used  to  distinguish  different  versions  of  one  class  of  an  object  and  to  discriminate  it  from  other  Lp-  l 
classes.  This  analogy  is  most  attractive,  since  the  h^  filters  synthesized  at-  ve  can  be  modified 
allow  different  distorted  versions  of  one  object  (eg  several  input  key  vectors)  to  be  associated 
with  the  same  encoded  output  (e  g  the  same  recollection  vector)  which  will  now  denote  the  class  f  r 
subset  of  several  input  key  vectors  (i  e  all  distorted  versions  of  an  input  c.-.n  be  assigned  the  same 


Incorporation  of  these  pattern  recognition  techniques  into  associative  memory  synthesis  allows 
significantly  different  recollection  vector  encodings  from  the  conventional  unit  vector  ones  to  be 
employed  Incorporation  of  these  new  recollection  vector  encodings  and  the  associated  new-  associative 
memory  synthesis  techniques  allows  the  size  of  the  matrix  to  be  significantly  reduced  and  it  adds  a 
distortion-invariant  property  to  the  associative  memory.  As  we  will  show,  the  use  of  such  encoding 
techniques  actually  provides  improved  noise  and  storage  capacity  performance  over  the  conventional 
unit  vector  HAMs  We  note  that  for  the  SDF  solution,  we  must  be  able  to  invert  V  and  thus  this 
technique  also  requires  linearly  independent  key  vectors,  or  the  use  of  advanced  techniques  iri  th*- 
synthesis  of  such  filters.  \\  e  also  note  that  many  pattern  recognition  preprocessing  techniques  have 
been  described  to  achieve  the  necessary  preprocessing  to  provide  linearly  independent  as  well  as 
orthogonal  key  vectors  Many  of  these  techniques  are  off-line  However,  when  the  associative 
memory  need  not  be  updated,  these  synthesis  techniques  are  appropriate 


e  now  consider  the  analogy  between  MSE  associative  memories  with  linearly  dependent  key 
vectors  and  the  typical  MSE  LDFs  used  in  pattern  recognition  when  the  are  feature  vectors  In 

this  case,  the  LDhs  are  denoted  by  w^  and  the  VIP  projection  v  alues  w,  "^x  determine  the  region  ir.  a 
hvperspace  in  which  the  input  key  vector  lies  and  hence  determine  the  class  of  the  input  data  W'e 
note  that  there  is  do  assurance  that  even  the  training  set  data  will  be  correctly  classified  bv  this 
technique  (since  this  is  an  approximate  rather  than  an  exact  solution). 


Various  LDP  techniques  to  calculate  associative  memories  are  now-  summarized  In  each  case,  we 
calculate  a  LDF  w^  and  use  it  as  a  row  of  our  associative  memory  matrix  We  design  this  LDF  to 
yield  an  output  of  "  1  ”  for  certain  classes  and  an  output  of  "0"  for  the  other  classes  (i  e  according  to 
the  coding  desired  and  required  for  that  row  of  the  matrix).  A  multi-class  problem  is  addressed  by 
specifying  two  classes  for  each  LDh  ,  with  each  of  these  two  classes  being  subsets  or  groups  of  more 
than  two  classes,  with  the  output  K-tupie  or  binary  code  allowing  the  final  one-of-many  class  decision 
to  be  made  Lse  of  such  techniques  allows  the  application  of  associative  memories  to  image  analysis 
distortion-invariance,  and  can  significantly  increase  associative  memory  storage  capacity,  as  we  wi'i 
note  and  quantify  LDFs  that  can  be  calculated  using  the  training  set  in-class  and  betw een-class 
scatter  matrices  include  the  Fisher  LDF  and  the  Hotelling  LDF. 


3.  NOISE  PERFORMANCE  AND  STORAGE  CAPACITY 
OF  ASSOCIATIVE  MEMORIES 


3.1  INTRODUCTION 

This  section  provides  a  theoretical  analysis  of  nc  is*.-  p'-rfc  r  Mi  -  and  so.. rag''  ^a;o.1 
and  HAMs  \\Y  emphasize  the  difference  in  AAU  and  HAM  result-  the  need  fc  r  a 
measure  and  ho*  different  recollection  vector  -Luces  signifi  ant  o  improve  result** 
reading,  details  of  several  important  recent  results  are  included  i appendices-  I;..: 
emphasize  the  key  points  w  ith  a  minimum  of  mathematical  digres*  :  n  W»;  first  bef;: 
and  introduce  our  notation.  The  input  key  vector  i?  x  =  x,.^n_.  wh*  re  *s  one  cf  th- 

and  n  is  a  noise  vector  of  zero-mean  noise  with  a  covariance  matrix  T  =-  r  "I 

O  O 

variance  cf  the  input  and  output  ncise  by  c  ~  and  a^,  where  the  variance  of  a  rand 

non  0.0 

<r  *  =  E{x"  }-E{x  }**  For  zero-mean  data,  a —  E { x " }  This  is  the  case  for  the  in; 
noise,  since  the  associative  memory  matrix  operator  M  is  linear  (if  no  output  thresh' 
We  use  subscripts  to  denote  specific  vectors  in  a  set  and  superset. pts  to  denote  the 

o  i  o  n  t 

vector.  In  this  notation,  a*  =  E{(n  )"}  (from  the  definition  of  ro  and  c q~  =  E{(y 
requires  two  terms  since  v  is  not  yet  known),  where  v  =  M  x,  the  recollection  vector 
to  the  key  vector  x^,  and  the  expectation  operation  is  over  only  the  elements  of  the  v 
all  vectors  v^ 


3.2  PRIOR  RESULTS 


The  typical  associative  memorv  performance  measure  used  has  been  a  "  c  * 

o  i 

this  parameter  indicate  gc«~d  perforn.ar.ee  Kohonen  1;  proved  for  A  A  Ms  that 


where  sm.a 


o  “  c  *■  i—  M  N 

0  I 

and  reason*  t  that  the  result  for  H.AMs  would  be  about  the  same  Other  work  JG  sboueu 
incorrect  The  documentation  of  this  work  is  very  terse  and  thus  it  merits  more  details, 
provide  All  steps  are  provided  in  appendices,  with  the  results  highlighted  here  M> 
simulations  were  performed  for  the  A  AM  case  |I6’,  with  *he  key  vectors  chosen  from 
distribution  between  -  I  and  -r-\  and  with  the  key  vectors  required  to  be  linearly  independe 
found  to  be  a  requirement,  although  it  is  not  noted  in  the  original  work)  The  key  test  \  < 
formed  by  adding  a  zero-mean  random  variable  (with  uniform  distribution  over  -I  to  — 
element  of  one  of  the  random  reference  key  vectors  For  each  associative  memory  matrix 
key  '-ectors  (each  of  length  N)  were  tested  using  one  reference  vector  with  ten  differ 
realizations  of  noise  with  the  same  level  c  We  assume  that  this  i*.  what  was  done  ;r,  t 

i 

rejerence  Different  M  N  ratios  were  tested  by  fixing  N  —  50  and  by  varying  M  \  .  itt  te 
input  vectors  used  to  t'*st  each  memory  matrix  Nil  For  the  case  .  f  a  HAM.  each  elemem 
e!*:;.*:.t  ret  -  .bo.  tin  v  ».*•.  *.  ^  a*  .s.-n  i '•  !rom  a  unrlorm  du  t  r :  t  utiot:  between  -1  ar.it 

rer  :■!!*•:■  1 1 ■  •  r i  v'et-'-r4  had  :i. than  -.m*  "1"  and  are  thus  no'  u 1 1 * t  \  ret  rs 


V> »  defi;.*-  th#*  signal  power 


a  vp.  tor  to  he  E{  (  x,-)“  }- E  (  v! '  * 


vector.  Smce  the  key  and  recollection  vectors  were  chosen  in  the  same  manner  in  these  ear 
[16],  their  signal  powers  are  equal  and  o  */a-  is  equivalent  to  the  input-to-output  SNR  ra 
will  use  this  SNR  ratio  in  our  later  work  (Sections  3  4  and  4)  as  a  preferable  performance  me 
more  practical  cases  when  the  input  and  recollection  vectors  do  not  have  the  same  signal  pov 
proofs  of  the  *arious  theorems  to  be  advanced  in  this  section  and  in  subsequent  ones  do  no 
equal  signal  powers  for  the  key  and  recollection  vectors 


We  now  state  four  theorems  ( 1 6 J  Proofs  of  each  are  provided  in  the  appendices. 

•  Theorem  1:  For  anv  matrix  recollection  v  =  M  x,  we  find  a  “jo  =  NEIm  *},  wh 
rrm  is  an  element  of  M  and  the  expectation  is  over  all  elements  of  M 

2 

•  Theorem  2:  For  an  AAM  with  linearly  independent  key  vectors,  we  find  E{m  } 

M/N2. 


•  Theorem  3:  For  AAMs  with  linearly  independent  key  vectors,  combining  Theorems  1  a 
2,  we  immediately  find 


a  2/tr  2  =  M/N.  ( 

•  Theorem  4  For  HAMs,  we  find 

V/^,2  =  E{y,J2}E{Tr(V-1)}. 

where  y.  is  an  element  of  V,  V  =  X  X,  and  the  trace  (Trj  is  the  sum  of  the  diagoi 
elements  of  the  matrix  noted  in  parentheses  following  this  operator  The  first  expect 
value  operator  is  taken  over  all  elements  of  Y  and  both  expectation  operators  are  tak 
over  the  entire  ensemble  of  possible  key  and  recollection  vectors 


3.3  DISCUSSION  AND  ANALYSIS 


Theorem  1  is  useful  since  it  applies  for  any  matrix  with  no  key  or  recollection  vector  assur 
We  will  use  it  in  developing  more  general  and  more  easily  evaluated  expressions  of  associative 
performance. 

The  result  in  Theorem  3  agrees  with  that  of  Kohonen  j]',  who  obtained  his  result  fc 

different  techniques  This  result  shows  and  quantifies  for  linearly  independent  key  vectors  (n 
o  o 

M  <  N)  that  o  ‘/a*  <  1 ,  i.e  an  AAM  always  reduces  the  input  noise  (or  in  the  worst  case  wht 
N,  the  input  noise  is  not  increased)  This  also  shows  that  the  noise  improvement  for  a  AAM  i 
as  M/'jN  decreases  (i.e  as  fewer  vector  pairs  M  are  stored  or  when  larger  dimensionality  N  key 

O 

are  used).  For  an  AAM  design,  the  amount  of  input  noise  expected  o~  is  specified  and  the 

O 

determines  the  output  noise  c  “  one  will  have  to  contend  with  In  later  work,  we  will  quan’ 


amount  of  output  noise  that  one  can  have  and  achieve  a  given  probability  of  correct  ciassific 
different  output  recollection  vector  encoding  schemes. 

Theorem  4  shows  that  the  amount  of  noise  reduction  in  a  HAM  depends  on  the  ke\  '■  ect 
occurs  through  the  TrfV"1)  term)  and  that  it  also  depends  on  the  recollection  vector  choic 
(this  occurs  through  the  y^  term)  and  that  its  performance  does  not  depend  as  explicitly  on  N 
as  is  the  case  with  an  AAM  This  is  a  most  significant  result,  since  AAM  storage  capacity  a 
performance  depends  only  on  M  and  N.  The  remark  has  been  made  [  1 6 ]  that  HA_M  perform 
be  very  poor,  even  with  linearly  independent  vectors.  To  see  why  this  might  occur,  it  is  nc 
the  determinant  of  X^X  can  be  small  (even  with  linearly  independent  x^  vectors)  This  occi 

XTX  is  nearly  singular  In  this  case,  TrQf1)  becomes  large  and  poor  performance  will  res 
note  that  poor  performance  would  also  result  from  any  associative  memory  matrix  synthesiz 
the  case  when  \_1  was  hard  to  compute,  i  e.  when  its  condition  number  was  large.  e  note 
general  AAM  performance  measure  equation  does  not  reflect  the  effect  of  the  condition  num 
directly.  However,  the  HAM  expressions  do  reflect  this  issue,  through  their  dependence  on 
matrix  V.  Thus,  it  may  appear  that  HAM  performance  would  be  poorer  than  that  < 
performance,  even  with  linearly  independent  key  vectors.  However,  this  is  not  necessarily  the 
we  have  Doted  above.  We  will  quantify  these  remarks  in  our  data  (Section  4)  In  deriving  Th 
we  assume  equal  energy  for  ail  recollection  vectors  (but  their  energy  is  not  assumed  equal  t< 
the  key  vectors) 

We  note  that  the  ensemble  averages  in  the  equations  in  Theorem  4  make  evaluation  of  the 
performance  measure  for  a  HAM  impossible  to  evaluate,  except  by  a  Monte  Carlo  techmqi 

O  9 

Monte  Carlo  method  calculates  oo‘ /o*  by  averaging  over  a  number  of  different  associative  r 
(i.e.  different  key  and  recollection  vector  pairs)  For  this  reason,  the  results  of  a  Monte  Carlo 

9  o 

as  obtained  earlier  [16]  are  not  necessarily  a  good  estimation  of  /a“  for  specific  problems 
2  2 

other  oq  expressions  are  desirable,  in  which  the  expectation  over  the  entire  ensembl 

required.  In  addition,  in  the  prior  tests  [16],  the  recollection  vectors  used  were  random,  h 
than  one  "1*,  and  had  energy  equal  to  that  of  the  key  vectors.  This  is  appropriate  for  an  A  A 
not  the  conventional  HAM  situation  and  (and  we  shall  show)  the  choice  of  the  recollectio 
significantly  affects  HAM  performance.  Specifically,  the  test  results  in  [16!  are  not  valid 
recollection  vectors,  binary  encoded  recollection  vectors,  etc.  Also,  if  the  dimensionality  of 
and  recollection  vectors  are  different,  then  the  test  results  in  [lr  are  not  too  useful.  In  addi 

O  O 

variance  of  the  oq  /o-~  measure  can  be  quite  large  (especially  when  averaged  over  a  nu 

0,0 

different  associative  memories).  Thus,  the  resultant  a  “/< average  can  be  meaningless  ai 

O  O 

better  (smaller)  o^jo~  values  can  result  for  specific  HAls  When  the  rules  we  derive  f 

o  o 

design  are  used,  better  a ^ jo  '  performance  measures  will  result 
o  o 

Other  a ^ jo  expressions  are  possible  in  the  case  of  unit  recollection  vectors,  Y  =  cl.  vvh< 
constant  In  this  case  of  HAMs  with  unit  recollection  vectors, 

O  O  O  1  , 

i  ~io  -  (c/K  )Tr  v~  ; 

O  I  '  ' — 
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The  second  instance  in  which  an  equation  without  all  expected  value  operators  is  possible  on 
the  case  of  orthogonal  key  vectors  In  this  instance  of  HAMs  with  orthogonal  key  vectors 


E{.V;f}Tr;V 


-I. 


(1 


where  the  expectation  operator  is  the  average  over  all  squared  elements  of  Since  c"  K  ir. 
equals  E { y * }  for  V  =  cl,  Eq.(lO)  is  equivalent  to  Eq  (9).  Thus,  in  terms  of  performance,  as 
unit  recollection  vectors  is  analogous  to  using  orthogonal  key  vectors.  This  is  a  notewort 
result,  since  one  might  feel  that  orthogonal  key  vectors  would  yield  better  performance  Thu 
follows  from  linear  algebra,  since  V  (and  V"1)  are  diagonal  if  the  key  vectors  are  orthogon 
yielding  only  the  trace  elements  of  the  matrix 


For  cases  when  no  conditions  on  the  recollection  vectors  v^  (such  as  unit  recollection  vect< 
made  and  similarly  when  no  conditions  on  the  x^  key  vectors  are  made,  Theorem  1  can  be  usi 

O  O 

alternate  a  *  jo  expression  can  then  be  found  by  substituting  Eqs  (AlO)  and  ( A 1 3 )  in  the  app 
into  Theory  1  to  obtain 


a  o  =  I  k  ir  JE  v  ,  v  v  ,  , 
o  '  i  v  '  '•  .  mk  ■  im-  ik 

1  m  k 


(1 


where  v  .**  ls  the  mk-th  element  of  Nf1  Eq  ( 11)  is  equivalent  to  Theorem  1  However,  calcc 
using  Eq. ( 11)  are  preferable  since  it  provides  the  result  without  the  need  to  first  explicitly  comp 


In  our  quantitative  test  data,  we  will  use  Eqs.(S),  (9)  and  (11)  for  different  cases  Eq  (Sj  app 
AANls  with  linearly  independent  key  vectors  and  Eq  (9)  applies  for  HAMs  with  linearly  indep 
key  vectors  and  with  unit  recollection  vectors  Eq  (10)  applies  for  orthogonal  key  vectors  and 
has  no  conditions  on  the  recollection  vectors  or  the  key  vectors 


3.4  PREFERABLE  SNR  ASSOCIATIVE  MEMORY  PERFORMANCE  MEASURi 

All  prior  theoretical  studies  [1,16]  of  pseudoinverse  associative  memory  noise  performanc 
2  2 

used  the  ctq  jo.*  performance  measure.  Other  work  on  associative  memory  capacity  either  d< 

consider  HAMs.  yields  bounds  (not  exact  expressions),  or  does  not  consider  noise  This  c 
performance  measure  is  valid  for  ,AAMs,  but  not  for  HAMs,  since  its  resultant  value  can  be  r 
(improved)  artificially  by  merely  reducing  the  energy  of  the  recollection  vectc.o  Ji  e  by  usir 

O  O 

rather  than  binar v-encoded  recollection  vectors).  Our  a  “  a"  data  verifies  that  unit  recol 

'  o  i 

O  o 

vectors  perform  better  than  binary  encoded  ones  To  see  the  problem  with  the  m< 

consider  Theorem  1  for  the  case  of  a  HAM  If  we  scale  each  x^  by  a  constant  c^  and  each  v 

constant  c  ,  then  the  new  associative  memory  matrix  is  Mr  =  (c  /c  )M,  where  M  is  the  o 
associative  memory  matrix  The  new  expected  value  (denoted  by  an  apostrophe)  is  related 

Of  0.0 

expected  value  for  the  original  matrix  (denoted  by  no  apostrophe)  by  E{m^4'}  ~  (c^V  c^  jE{ 


o  o  f  2  2\  2  °  - 

Tbe  new  and  old  performance  ratios  are  thus  related  by  (<7oVa,~)  ~  (cy  /cx  >a o  ! ° \  ^  rom  1^1S 

2  2  i 

see  that  increasing  c  ,/c  results  in  an  improved  new  oy  /cr  ratio  However,  this  improvemcr. 
artificial  We  note  that  this  issue  does  not  arise  for  th<  case  of  an  AAM  (since  for  this  matrix 
recollection  and  key  vectors  are  the  same,  and  thus  have  the  same  energy  and  scaling  factors).  T 
remarks  also  do  not  apply  to  earlier  results  j  1 6 ' ,  where  equal  energy  key  and  recollection  vectors  i 
used  in  the  Monte  Carlo  data  obtained.  This  o^jo*  performance  could  be  applied  to  an  HAM  ' 


Y  =  I  (or  to  binary-encoded  recollection  vectors,  or  to  recollection  vectors  whose  dimensionality  F 
N),  by  appropriately  scaling  the  recollection  vectors,  such  that  their  energy  and  that  of  the 
vectors  is  the  same  In  general,  with  arbitrary  key  vectors  and  unit  or  other  possible  recollec 
vector  encoding  schemes,  the  need  exists  for  a  different  performance  measure. 


Tbe  performance  measure  we  introduce  is  the  output-to-intput  SNR  (signal-tc^noise)  ra 
SNR  /SNR  The  larger  this  ratio,  the  better  the  performance  For  equal  key  and  recollection  ve 

2  2 

energies,  this  measure  and  a  /<r  are  reciprocals.  We  define  the  signal  powers  as  the  expected  v. 
of  tbe  square  of  the  elements  minus  the  square  of  the  expected  value  of  the  elements,  i.e  we  subt 
off  the  average  or  bias  energy  from  our  calculations  of  signal  energy  Thus,  the  signal  energies  we 
are 


Si2  =  ECx^l2}  -  E{xki}2  (12a) 

so2  =  E{;yk*!2}-E{yki}2,  (12b) 

where  the  energy  values  are  averages  over  all  elements  i  of  all  vectors  k  The  resultant  S 
performance  ratio  is  then 


O  O 

SNR  s  "a* 

o  o  i 

o  n 

SNR  s  V  ‘ 

i  i  o 


(13) 


o  o 

For  AAMs  (with  so*  =  S|  ),  Eq(13)  reduces  to  N/M  (from  Theorem  3)  which  is  the  rcciproca 
7'heorem  3. 


Our  concern  lies  with  FLAMs.  For  FLAMs  with  unrestricted  key  vectors,  we  combine  Eqs.(ll) 
(13)  to  obtain 


SNR 


s  2K 
o 


snr  s  2S  2rr  r  v  'V  v.. 

i  ii  ,  mk  Mnr  ik 

i  m  k 


(14) 


For  HAMs,  with  Y  =  cl  (or  for  the  case  of  orthogonal  key  vectors),  we  combine  Eqs  (10)  and  (13 
obtain 
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SNR,  s^v^JTrlV-1} 

o  o  f)  O 

For  zeromean  key  and  recollection  vectors.  s^“  =  E{y^*'}  and  =  E{xj-4'}  =  (l/M)Ir{\_}  l 
these  assumptions,  HA\is  with  Y_  —  cl  (or  HAMs  with  orthogonal  key  vectors)  yield 

SNR  M  1 

_ 1  =  -  =  — ,  (16' 

SNR,  TrjVjTrjV1]  M 

rT 

where  the  last  equality  holds  for  orthonormal  key  vectors,  since  V  =  X  X  =  I  and  TrjVJ  =  Tr 
=  M  for  this  case.  We  will  employ  the  different  performance  measures  noted  in  Eqs  ( 1 3 >-(  1 5 )  i; 
quantitative  comparison  tests  of  performance  in  Section  -1. 


3.5  DATA  MATRIX  AND  PSEUDOINVERSE  HAM  NOISE 
PERFORMANCE  COMPARISONS 


A  brief  comparison  of  the  data  matrix  and  pseudoinverse  HAM  with  unit  recollection  vectors  ( 
I)  is  now  provided.  Linearly  independent  key  vectors,  each  normalized  to  unity,  .ith  all  elen 
positive,  are  assumed  This  is  necessary  for  a  comparison  with  no  differences  in  the  key  vectors, 
the  pseudoinverse  HAM  requires  linearly  independent  key  vectors  and  the  data  matrix  associ 
memory  requires  normalized  key  vectors  The  HAM  with  M  =  Y  and  the  data  matrix  with  I 
X  are  both  M  X  N  in  size.  The  data  matrix  is  thus  equivalent  to  a  pseudoinverse  HAM  with 
(X  X)"  Thus,  in  our  performance  comparison,  we  compare  a  HAM  with  Y  =  I  to  a  HAM  (the 

T  i 

matrix)  with  Y  —  (X  X)  .  We  use  Theorem  1 

<ro2/*,2  =  NE{m|j2},  (17) 

iince  it  applies  for  any  matrix.  For  the  HAM  with  Y  =  I  and  M  C  N,  Eq  (17)  is  most  likely  less 
one.  For  the  data  matrix,  with  each  row  being  a  normalized  key  vector,  the  sum  of  the  squ 
elements  of  the  matrix  rows  of  M  is  just  M,  the  average  squared  element  is  M/MN  =  1/N 
la\  —  N(l/N)  =  1  from  Eq.(17).  With  Eq.(17)  being  less  for  the  Y  =  I  HAM.  it  will  have 
output  noise  for  a  given  input  noise  level.  This  better  performance  is  expected,  since  all  outpu 
the  Y  =  I  HAM  recollection  vector  are  expected  to  be  zero  (except  one).  To  consider  how  ou 
noise  effects  recall  accuracy  in  the  two  memories,  note  that  all  Y  =  I  HAM  outputs  are  ideally 
except  for  the  single  element  with  a  “1"  output,  whereas  for  the  data  matrix,  the  non-one  ou 
elements  are  the  vector  inner  products  of  the  input  and  the  different  references  and  will  clearl 
greater  than  zero.  Thus,  the  same  amount  of  output  noise  in  each  memory  can  more  easily  cause 
matrix  output  elements  to  be  in  error  (more  easily  than  is  possible  for  the  Y  =  I  HAM)  T 
differences  must  be  weighed  against  the  advantages  of  the  data  matrix  HAM,  such  as  it  does 
require  linearly  independent  key  vectors,  it  yields  nearest  neighbor  performance,  it  has  a  large  stc 
capacity  (compared  to  even  the  HAM  with  Y  =  !)  and  it  easily  allows  the  contents  of  the  data  m 
to  be  altered  (by  simply  changing  the  vector  in  one  row  of  the  matrix) 

1  1 


UANTITATIVE  DATA 


This  section  describes  our  database,  several  different  associative  memories  formed,  test  res 
associative  memories  for  specific  case  studies  using  the  different  performance  measures  der 
Section  3  and  the  Appendices. 


4.1  DATABASE 


The  database  used  to  provide  quantitative  test  data  (versus  numerical  calculations  base 
theory)  for  specific  pattern  recognition  problems  consisted  of  32  X  32  pixel  lexicographically  < 
binary  images  of  aircraft.  Each  image  was  lexicographically  ordered  into  an  input  key  ve 
dimension  N  =  32‘  =  1024.  Two  different  aircraft,  a  Phantom  and  a  DClO,  were  used.  The  : 
occupied  approximately  15%  of  the  full  32  X  32  input  image  frame.  Different  images  of  each 
rotated  in  yaw  formed  different  versions  of  each  aircraft  for  use  in  different  tests  and  I 
database  The  Phantom-18  database  contains  18  Phantom  jet  images  at  20°  increments  in  va 
a  full  360°  variation  Our  DC10-18  database  is  similar  with  DClO  images  used  We  err 
Phantom-36  and  DClO-36  database  set  in  other  tests  These  databases  contain  36  images  pe 
with  10°  increments  in  yaw  now  used  We  refer  to  the  set  of  images  used  to  form  the  memory 
reference  or  training  set.  In  some  cases,  we  test  the  performance  of  the  memory  using  oth< 
training  set  images  at  different  yaw  rotations  We  refer  to  thesi  as  test  data.  For  one  RAJ 
Phantom  and  DClO  data  are  used  and  the  purpose  of  the  associative  memory  formed  is  to  dist 

the  type  of  the  aircraft,  as  well  as  its  orientation  In  another  HAM  test,  we  consider  only  deter 

o  o 

the  class  of  the  aircraft,  and  not  its  orientation  For  noise  tests  of  a  * la  and  SNR  /SNR  .  1 

zero-mean  Gaussian  noise  with  five  different  standard  deviations  cr  to  the  reference  Pham 
image.  For  each  input  test  image  with  a  given  <r  or  SNR^,  we  form  10  different  input 
(different  input  test  vectors)  with  the  same  <7-  value  and  SNRi  value  (however  using  10  di 
realizations,  different  seed  values,  for  the  given  input  noise  level).  In  all  noise  tests,  noisy 
images  were  not  rebinarized.  This  allows  a  better  comparison  between  theory  and  tests  To  p 
model  certain  real  time  optical  spatial  light  modulators,  we  should  rebinarize  the  noisy  input 
However,  we  feel  that  the  results  obtained  with  gray-level  input  test  vectors  would  be  represe 
of  those  obtained  using  rebinarized  input  key  vectors  to  our  associative  processors 


4.2  TYPES  OF  ASSOCIATIVE  MEMORIES  TESTED 


To  test  and  quantify  associative  memory  performance,  three  different  associative  memorie 
considered.  For  consistent  results,  all  memories  employed  M  —  36  key/recollection  input  vecto 
(the  Phantom-18  and  DClO-18  databases).  The  AAM  was  formed  from  Eq  (2)  with  Y  =  X 
different  HAMs  were  also  constructed.  HAM-1  used  unit  recollection  vectors  with  Y  —  I  in 
with  a  different  K  =  M  =  36  element  output  recollection  vector  for  each  of  the  36  input  ii 

The  second  HAM-2  tested  had  N  =  1024  and  M  ==  36  (as  did  all  associative  memories  constri 

T  T 

and  used  only  two  element  (K  =  2)  output  recollection  vectors  [  1 ,0]  and  (0,1) 1  for  the  Phanto 


DC  10  inputs  respectively  (i  e  all  18  Phantom  key  vectors  were  assigned  the  same  output  n-roliee 
vector  ;i,0jT  with  the  other  recollection  vector  [0, 1 }  used  for  all  DC10  inputs)  Since  both  Phan 
and  DC10  inputs  were  used  in  fabricating  the  associative  memories,  they  achieve  both  ir.tr a-t 
recognition  (e  g.  the  recognition  of  different  distorted  versions  of  the  same  aircraft,  le  a  Pliant- 
and  inter-class  discrimination  (distinguishing  a  Phantom  from  a  DC10)  The  HAM-2  is  appropi 
for  image  analysis  when  the  type  of  object  rather  than  its  orientation  is  desired  This  is  c 
different  from  the  HAMs  conventionally  considered.  For  all  associative  memories,  we  calculated 
using  the  LMSL  Generalized  Inverse  Subroutine.  All  key  vectors  were  found  to  be  line 
independent  This  was  verified  from  a  calculation  of  the  condition  number  ( ^ M m , n  =  183) 

«r» 

=  X*X,  which  showed  that  the  rank  of  V,  which  equals  the  rank  of  X.  was  M  —  36 
pseudoinverse  thus  equals  X'*"  in  Eq.(3) 


4.3  ASSOCIATIVE  MEMORY  TEST  RESULTS  USING  THE  a  2 ia  2  MEASURE 

-  O  1  - 

Our  initial  test  results  are  summarized  in  Table  1.  Each  eDtry  in  this  table  is  the  average  ol 
realizations  of  noise  with  the  standard  deviation  listed.  The  performance  measure  tabulate 

o  o  o  o 

a  for  the  A  AM  and  the  two  HAMs  constructed.  The  average  of  the  measured  a  a  ‘  v  alue 

all  50  noise  image  tests  for  each  associative  memory  are  given  in  the  bottom  of  the  table 
theoretical  value  for  the  AAM  is  calculated  as  M/N  from  Eq.(8)  and  it  agrees  quite  well,  within 
with  the  measured  average.  For  both  HAMs,  theory  and  experiment  also  agreed  quite  well  (wi 
1.5%  and  11%).  The  theoretical  values  for  HAM-1  (with  unit  recollection  vectors)  were  calcul 
from  the  trace  of  V1  in  Eq  (9)  with  c  =  1  and  K  =  M  =  36.  For  the  second  HAM  with  only  K 
output  elements,  we  calculate  the  theoretical  value  using  Eq  (11)  Several  initial  obvious  remark: 
in  order.  First,  we  note  general  good  agreement  between  theory  and  tests  Secondly,  we  note 

O  O 

HAM- 1  performance  is  50 %  better  than  that  of  the  AAM  (the  lower  performance  mea^ 

indicate  better  performance). 


The  results  (for  the  specific  key  and  recollection  vectors  chosen)  are  quite  different  from  other 
Monte  Carlo  results  averaged  over  different  HAMs  (using  random  key  and  recollection  vect 
These  prior  results  precicted  average  HAM  performance  to  be  worse  than  that  for  AAMs  by  at 
10%  when  M  >  0.2N.  Our  final  comments  concern  the  performance  of  the  two  HAMs.  The  se 
HAM  (with  only  two  output  recollection  vectors  and  two  recollection  vector  elements)  perfoi 
worse.  This  occurs  since  this  matrix  is  2  X  1024  with  its  first  row  being  a  sum  of  the  first  18  ro1 
the  first  HAM  and  its  second  row  being  a  sum  of  the  second  18  rows  of  the  first  HAM  Recall 

O 

the  size  of  the  first  HAM-1  is  36  X  1024  In  this  case,  summing  the  rows  of  M  increases  E{m  ‘} 

o  o 

causes  an  increase  in  crQ~/o‘  (and  thus  poorer  performance)  In  general,  summing  the  rows  o 

O 

first  HAM  will  not  always  increase  E{m  since  the  elements  of  M  are  bipolar  Here,  an  inc 
occurred,  because  the  key  vectors  corresponding  to  the  added  rows  are  members  of  the  same 
(rotated  yaw  views  of  the  same  aircraft)  and  are  thus  similar,  causing  the  added  rows  to  be  sir 
We  discuss  these  results  and  preferable  performance  measures  later  in  Section  4  4 


TABLE  1:  a  Vm  for  AAM  and  HAM 

o  '  » 


AAM  HAM 

Y=  I 


0.2 

H 

0.0949 

0.3 

m 

0.0218 

0.153 

0.4 

0.0253 

0.0949 

0.5 

00323 

0.0)80 

0.201 

0.6 

0.0387 

0.0236 

0.0655 

average 

0.0364 

0.022  J 

0.122 

theory 

0.0352 

0.0218 

0.136 

4.4  ASSOCIATIVE  MEMORY  TEST  RESULTS  USING  THE 


SNR  /SNR  MEASURE 
0  ! - 

We  now  test  and  compare  our  three  associative  memories  using  our  SNR  ratio  performai 
measure.  Our  results  are  shown  in  Table  2.  Larger  values  for  this  performance  measure  indie 
better  performance.  In  each  case,  the  data  presented  is  the  average  of  50  runs  for  five  different  nc 
a-  values,  with  the  measured  data  obtained  from  image  domain  tests  These  measured  data  are  tl 
compared  to  the  associated  theoretical  equations.  The  AAM  results  are  the  reciprocal  of  those  gi' 
in  Table  1.  For  HAM-1  (with  unit  recollection  vectors),  s  /s‘  is  small  and  for  HAM-2  (with  'l.C 

or  |0,lj  recollection  vectors)  this  ratio  is  large  (since  HAM-1  has  more  zeroes  in  each  recollecti 

o 

vector).  Thus,  the  SNR  performance  of  HAM-2  is  better  than  for  HAM-1  (although  its  oq  ft 
performance  was  worse).  Eq.(13)  and  Table  1  were  used  for  all  theoretical  calculations  in  Table 
From  these  specific  tests,  we  find  AAM  noise  performance  to  be  better  than  HAM  noise  performar 
(as  one  would  expect)  and  that  different  HAMs  (such  as  those  with  K  =  2  output  elements,  t 
number  of  general  classes  of  the  data)  are  preferable  to  the  conventional  HAMs  (with  V  —  I  ui 
recollection  vectors  with  K  =  M  =  36  elements  and  36  output  unit  vectors).  This  represents  a  m 
result  These  quantitative  results  in  Table  2  are  not  necessarily  general  trends,  but  are  da 
dependent  as  we  now  discuss. 


The  performance  of  an  AAM  depends  solely  upon  the  M  and  N  values  HAM  performance  depen 
upon  W1  with  HAM-1  performance  depending  only  upon  the  diagonal  elements  of  V~'  (because  DCl 
are  slightly  larger  than  Phantoms,  the  diagonal  elements  are  not  the  same)  and  wuh  HAM 
performance  depending  upon  all  elements  of  V"5.  Since  HAM  performance  depends  upon  the  k- 
vectors  used,  no  general  conclusion  on  AAM  versus  HAM  performance  is  possible  However,  HAh 
with  new  (binary)  recollection  vector  coding  consistently  perform  better  than  HAMs  with  convention 


unit  recollection  vectors  Our  theory  in  Section  3  predicted  this  (for  the  ru';-,  p'-r!-. 

measure)  The  presence  of  the  elements  of  Y  ( recollect  ion  vectorsj  ir.  our  equations  ;f;  Secti' 
'  o  o 

confirms  this  theoretical!}  and  our  lest  data  in  Table  2  quantify  it  As  OiS'.ussed  sq  S(  ^s  toil 

HAM-2,  which  is  the  reason  why  our  new  HAM-2  outperforms  HAM-1 

TABLE  2:  SNR  /S  NR  for  A  AM  and  HAM 

i 


SNR  Q/ SNR, 

AAM 

HAMl 

Y=  I 

HAM  2 

1 1.0 )T,  |0.1)7 
outputs 

average 

27  47 

9.14 

15.33 

theory 

28.41 

9.26 

13.75 

4.6  LARGE  CLASS  PROBLEMS 


The  concern  in  associative  processors  should  be  large  class  problems  (M  large)  We  now  bn< 
consider  how  AMM  and  HAM  performance  varies  with  M/N  We  expect  performance  to  decrease 
M/N  increases  From  Eq  (S),  we  expect  AAM  performance  to  reduce  linearly  as  M  increases 
HAMs,  the  performance  variation  with  M  will  depend  upon  the  specific  data  Table  3  shows  ini 
results  obtained  Eqs.(8),  (15)  and  (14)  were  used  for  the  three  associative  memories  respective!} 
second  database  used  36  images  of  each  aircraft  at  10°  yaw  increments  and  thus  represents  a  largei 
=  72  class  problem  A  AM  performance  is  seen  to  be  linear  with  M  and  thus  reduces  by  a  factor  c 
as  shown  The  reduction  for  the  HAMs  is  data  dependent.  From  these  data,  we  clearly  sec  that  H. 
performance  does  not  degrade  as  fast  as  AAM  performance  and  that  at  M  =  72,  the  performance 
HAM-2  and  the  AAM  are  approaching  each  other.  Again,  this  result  is  not  a  general  trend  that 
can  always  be  assured  of  (since  HAM  performance  is  data  dependent)  However,  this  lends  furl 
justification  for  attention  to  HAM  storage  capacity  and  noise  performance  and  to  different  out 
recollection  vector  encoding  schemes 


TABLE  3:  Associative  M  emorv  SNR  /SNR  Performance  as  M  Increases 

•  o  I 


TYPES  OF  ASSOCIATIVE  MEMORY 

Database 

M 

AAM 

HAM- 1 

(V  =  I) 

HAM-  2 

T 

*-k  " 

[1,0)  and  i0, 1  [ 

28  4 

9  26 

13  75 

Phantom-36 

DClO-36 

72 

14  2 

6  56 

1 1  84 

6  56 


1 1  84 


5.  ASSOCIATIVE  MEMORY  UPDATING 


Brief  remarks  are  now  advanced  on  updating  (adding,  deleting  and  reassigning  key,  recollect,: 

vector  pairs)  in  associative  memories  We  now  use  subscripts  to  denote  the  number  of  vector  pas 

stored  In  the  case  of  an  associative  memory  formed  from  M  key  recollection  \ector  pairs.  M  = 

Y.  .V  y.  In  this  case,  we  can  add  a  new-  key/recollection  vector  pair  and  calculate  the  new  ' 

— M  — M 

matrix  from  the  new  This  is  possible  directly  from  -M  *  ,  by  the  bordering  algor 1 1 h r 

Extensions  of  this  algorithm  allow  a  vector  pair  to  be  deleted.  This  deletion  is  easiest  if  the  last  x 
and  vectors  are  the  ones  to  be  removed.  To  delete  another  vector  pair,  we  first  make  this  the  la 
vector  pair.  When  the  VIP  associative  memory  synthesis  technique  is  used,  the  key  sectors  a 
orthonormal  (this  can  be  achieved  on-line  as  we  have  discussed  elsewhere),  and  updating  of  the  matr 
M  ls  very  simple  In  a  VOP  associative  memory,  the  addition  of  a  vector  pair  is  achieved  I 
determining  the  amount  of  each  vector  that  is  new  (orthogonal  to  the  prior  \ccior  pairs)  and  include 
it  (as  a  VOP,  etc.)  to  the  memory  matrix.  Deleting  a  vector  proceeds  similarly,  but  requires  that  t 
vectors  be  removed  from  all  prior  vectors.  Updating  a  data  matrix  memory  is  trivial  as  t 
associated  row  is  simply  replaced,  with  no  concern  for  the  other  rows 

fl.  SUMMARY  AND  CONCLUSION 


This  paper  has  advanced  various  new  theories  and  expressions  for  associative  memories  for  neur 
processing  We  first  noted  the  different  types  of  associative  memories,  the  key  vector  assure, ptic 
generally  made  and  the  fact  that  many  of  these  assumptions  are  not  necessarily  valid  We  advanc 
new  on-line  \TP-GS  techniques  to  calculate  the  pseudoinverse  memory  from  an  orthogonal  basis  s' 
We  also  noted  the  differences  in  storage  capacity  and  noise  performance  (both  issues  must 
considered  together)  for  AAMs  and  HAMs  We  advanced  a  new  and  preferable  performance  measu 
for  more  general  classes  of  HAMs  We  also  derived  equations  which  allow  the  performance  of  differc 
associative  memories  to  be  computed  more  easily  and  without  Monte  Carlo  techniques  Our  resu 
showed  that  HAM  performance  depends  on  the  key  and  recollection  vector  choice  (whereas  AA 
performance  depends  only  upon  the  values  of  M  and  N).  We  have  noted  the  similarity  betwe 
associative  memory  synthesis  and  LDFs  as  used  in  pattern  recognition  We  find  HAM  performance 
be  quite  dependent  on  a  set  of  recollection  vectors,  and  we  offered  new  associative  memory  desig 
with  new  recollection  vectors  (with  better  performance  than  conventional  H.AMs),  desig 
incorporating  LDF  design  techniques,  and  associative  memories  with  increased  memory  capacity  a 
reduced  memory  size  Initial  results  with  such  memories  appear  very  promising  Initial  remarks 
associative  memory  updating  with  several  new  algorithms  were  also  advanced 
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APPENDIX  Al:  PROOF  OF  THEOREM  1 


The  output  \  e  c  tor  is 


v  -  u  •  Mr 


Substituting  (Al)  into  the  definition  of  a  ~  y  it* Ids 

o  *  =■  EMM  n  l’  * }  =  rrEm  ni  ,  }E{nJnk}  (A2  ; 

J  * 

l  sing  the  property  of  uncorrelated  noise  that  EjnJnk}  =  E{  nJ  ~}6  ,  the  definition  i .  (  *  •  ~  -  * 

'  J  K  * 

o 

and  the  independent"  of  o  '  from  j  and  k.  we  obtain 

V  =  (A3) 

o 

Dividing  both  sides  by  cr',  we  obtain  Theorem  1  This  result  is  valid  for  any  matrix  whose  kev 
vectors  are  of  dimension  N  and  not  just  for  the  pseudoinverse  matrix  solution  Writing  the  squared 
Euclidean  norm  of  M,  we  see  [17;  that  the  minimum  norm  solution  is  M  =  Y  X~  It  can  also  be 
shown  that  this  solution  is  optimal  for  uncorrelated  noise,  and  that  it  minimizes  E{  m  and  also 

O  O 

or“jo\  (i  e  the  SNR  ratio  for  th  case  of  uncorrelated  ncsej 


APPENDIX  A 2 :  PROOF  OF  THEOREM  2 

For  independent  key  vectors,  the  solution  m  Eq  (2)  w  ith  X’’’  defined  bv  Eq  i 3 '  :s  sal: and  thus  f 
a:.  A  AM 


1  vrK-T 


m  --  xr  =  xix'xpx 


The  trace  of  M  M  is 

Tr(M  mti  riv  mt).  =  rrm  -  -  t-ind.  ;a5> 

i  '  j  IJ 

where  the  last  equality  follows  fiom  the  fact  that  M  is  idempotent  (M  =--  M*  |  and  s\  rnn.etr;..  M 
T-  "  —  ■ 

M  )  The  eigenvalues  o!  an  idempotent  matrix  are  0  or  1  The  number  of  eigen \  aluex  that  are  1 

"(M ),  i  e  the  rank  of  M.  and  the  trace  satisfies  TrlMl  ::  r( M  i  To  determine  r(M)  for  M  X  X  *  \ 

firs'  show  that  r'Xl  M  and  that  r(X*j  M  It  then  follows  that  r(M!  -  M  --  Tr;V  T1  us 


Tr  M’  -  ri'm  ‘  =■-  M 

“  ,  j  'J 

Using  (AG),  we  pr,.ve  Theorem  2 


V 


1,0 


MU.  KOI  .  "IPY  IHSOlUTini,  UsI 


Tr(M)  M 


(A7) 


APPENDIX  A3:  PROOF  OF  THEOREM  3 


This  follows  directly  by  substituting  (A7)  into  (A3) 


APPENDIX  A  4 :  PROOF  OF  THEOREM  4 


O 

We  consider  Theorem  1,  which  applies  for  any  matrix  and  derive  an  expression  for  E{m  *}  for  the 

l  T 

HAM  matrix  written  as  M  =  V  V^XV  We  first  rewrite  (A5)  for  the  general  HAM  case  of 
recollection  vectors  of  dimension  K  as 


Tr[M  MT;  =  r(M  MT)  •  =  EE  rm-2,  (A8) 

i  “  i  j  ‘ 

where  the  summation  over  i  runs  from  1  to  K  and  the  summation  over  j  runs  from  1  to  N  To 
evaluate  the  Theorem  1  equation  for  a  HAM,  we  must  obtain  an  expression  for  E{m  }.  Letting  the 
key  vectors  x^  (of  dimension  N)  and  the  recollection  vectors  v^  (of  dimension  K)  be  random  variables, 
we  form  the  expected  value  of  both  sides  of  (A8)  to  obtain 


E{TrlM  MTj}  -  £rE{m.  2} 

'  j 

The  double  summation  in  (A9)  can  be  rewritten  as 


(A9) 


E{Tr[M  M1  j}  =  KNE{miT}. 


(A10) 


2  T 

To  evaluate  Theorem  1  for  this  case  and  hence  E{rri|j  },  we  require  the  trace  of  M  M 

T 

To  obtain  this,  we  substitute  Eqs.(2)  and  (3)  for  an  HAM  into  M  M  and  find 


M  MT  =  Y  V'*YT 


(All) 


The  diagonal  elements  of  the  matrix  product  in  (All)  are 


<M  MT)„  =  r  r  vmk-V,„y,kl 

m  k 


(A12) 


where  both  summations  are  over  the  M  vector  pairs  The  trace  is  the  sum  of  (A12)  over  the  diagonal 
elements  (i  =  1  to  K)  yielding 


20 


Tr(M  m1)  =  rr  r 

i  m  k 


mk  •  inv  ik 


To  evaluate  (A9)  and  hence  a  "/< 7  we  form  the  expected  value  of  both  sides  of  ( A >  and  : : . ■ . • 

the  expected  value  operator  within  the  summation  as  in  (A9)  With  statistically  u nc<.-r r«  ; t  <  i  I  an  I 

recollection  vectors,  v^"1  and  have  no  cross-correlations  and  the  expected  value  of  1 1  j >• ; r 

product  is  the  product  of  their  expected  values  In  practice,  this  assumption  is  not  realistu.  since  the 

v^  depend  upon  the  and  are  thus  correlated  (except  for  the  case  Y  =  I)  In  tests  in  lfi  earh 

element  of  each  v  was  chosen  at  random  for  the  data  that  thev  used  Thus,  E{v  \  .  \  -- 
*■  •  *  -  1  rri -  ik 

9 

E{y-m  This  result  is  not  valid  for  binary  encoded  v^  vectors,  but  is  valid  f-  r  unit  recolie.  tion 

vectors.  With  these  assumptions, 

EfT'lM  MT|)  -  SS  S  E|v--'|Et,i.V  - 

1  m  k  1  m 


=  r  E{v  ‘^rEfy.  2}, 

'■mm  J  im  J ' 

m  1 


(A  14) 


where  the  last  equality  follows  since  E{vmm  ^ }  is  independent  of  1  The  second  summation  in  (AM)  is 

K  times  the  expected  value  and  is  independent  of  m  (for  the  case  of  recollection  vectors  with 

equal  power).  This  yields 


E{Tr[MMTj)  =  KE{>.  2}E/Tr[XTX;'1) 


(A15) 


Substituting  (A15)  into  (A10)  and  the  result  into  Theorem  1,  we  prove  Theorem  -4 
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ABSTRACT 

A  directed  graph  processor  and  several  optica!  realizations  of  its  input  symbolic  feature  vectors  and  the  multi¬ 
processor  operations  required  per  node  are  given.  This  directed  graph  processor  has  advantages  over  tree  and  other 
hierarchical  processors  because  of  its  large  number  of  interconnections  and  its  ability  to  adant.ivelv  add  new  n/y)«« 
and  restructure  the  graph.  The  us:  of  iLc  basic  concepts  of  such  a  directed  graph  processor  offer  significant  impact 
on:  associative,  symbolic,  inference,  feature  space  and  correlation-based  AI  processors,  as  well  as  on  knowledge  base 
organization  and  procedural  knowledge  control  of  AI  processors.  Initial  iconic  alphanumeric  data  base  results 
presented  are  most  promising. 


1.  INTRODUCTION 


12* 

Hierarchical  tree  classifiers  have  long  been  used  in  pattern  recognition,  particularly  for  non-parametric 
problems.3, 4  Much  has  been  written  concerning  optimization  of  tree  structures  using  information  theory 
techniques.5,  6’  7'  8  However,  hierarchical  structures  have  many  drawbacks.9  A  major  problem  is  that  an  incorrect 
decision  at  a  given  node  can  result  in  misclassification,  since  subsequent  nodes  are  not  designed  to  accommodate  prior 
classes.  Back-tracking  through  the  tree  can  compensate  for  this,  but  at  the  expense  of  classification  speed.10  The 
major  problem  is  the  rigid  structure  of  the  tree  itself,  its  limited  number  of  interconnections,  and  its  lack  of 
adaptivity.  The  optimization  techniques  mentioned  in  the  literature5,  6'  8  are  very  cumbersome  and  require  a  great 
deal  of  processing.  This  becomes  a  problem  when  an  additional  class  has  to  be  appended  to  the  tree.  The  problem  is 
that  the  new  class  must  be  added  as  a  terminal  node  of  the  existing  tree,9  but  classification  of  future  objects  of  this 
type  is  penalized  since  the  new  node  was  not  fully  integrated  into  the  tree  structure.  To  maintain  optimization,  the 
tree  must  be  entirely  redesigned,  using  one  of  the  optimization  schemes  cited  above,  for  each  new  added  node.  This 
report  suggests  an  alternate  modeling  for  large-class  classification  problems  using  directed  graphs.  Our  new  version 
of  directed  graph  techniques  is  very  flexible  because  new  classes  and  restructured  graphs  can  be  accommodated  easily 
without  penalty.  Our  proposed  algorithm  for  directed  graph  construction  is  ideal  for  parallel  opt’e2l  or^hitectures 
that  can  quickly  perform  the  computationally  intensive  steps  of  multiple  filter  or  discriminant  function  comparisons 
at  each  node  of  the  graph.  Optical  processing  is  particularly  attractive  because  of  its  ability  to  perform  many 
parallel  comparisons  concurrently. 

The  outline  of  the  paper  follows.  Section  2  explains  the  topic  of  directed  graphs  and  introduces  the  terminology 
used  to  describe  them.  Section  3  extends  the  concepts  of  a  directed  graph  to  model  general  classification  problems. 
Section  4  outlines  our  directed  graph  algorithm  and  shows  its  versatility  for  adaptation  and  alteration/adaptivity  in 
the  construction  and  use  of  the  graph.  Potential  methods  of  handling  input  object  distortions  are  also  presented. 
Section  5  outlines  potential  optical  architectures  to  produce  feature  spaces  and  to  implement  the  directed  graph 
algorithm  in  parallel.  Section  6  summarizes  the  findings  of  this  report. 

2.  DIRECTED  GRAPHS 


A  directed  graph  (sometimes  called  a  digraph)  is  a  collection  of  nodes  or  vertices  v  ,  and  a  collection  of  arcs 


joining  some  or  all  of  the  vertices.11  An  example  of  a  directed  graph  is  shown  in  Figure  1.  Note  that  the  graph  does 
not  have  to  be  symmetrical.  The  presence  of  an  arc  from  to  v e  does  not  guarantee  that  an  arc  also  exist  from  v 2  to 
Dj.  Symmetry  between  vertices  can  be  accommodated  in  this  structure  by  explicitly  connecting  two  nodes  with  an 
arc  in  each  direction,  as  shown  between  v0  and  t>3.  Two  vertices  joined  by  an  arc  are  adjacent.  The  indegree 
(outdegree)11  of  vertex  v  is  defined  as  the  number  of  arcs  entering  (leaving)  v^.  A  loop  is  an  arc  starting  and  ending 
on  the  same  vertex,  like  the  one  at  v^.  A  path  exists  between  two  vertices  if  one  can  travel  from  one  to  the  other 
along  existing  arcs,  as  between  and  t>3.  The  cardinality  of  a  path  is  the  number  of  arcs  contained  in  that  path.  A 
path  which  starts  and  ends  at  the  same  vertex,  such  as  t>2  — *  — ►  t>s  — ►  t>2,  is  called  a  circuit.  A  graph  is  disconnected 

if  some  nodes  are  not  reachable  from  other  nodes.  This  is  the  case  for  vertices  t>  — 1>5  which  are  disconnected  from 


Figure  1:  Directed  graph 


An  adjacency  matrix 11  A  determines  the  arcs  between  vertices,  where  the  element  A(ij)  is  equal  to  one  if  the 
graph  contains  an  arc  originating  from  vertex  v.  and  ending  at  in  or  is  equal  to  zero  otherwise.  Each  row  of  the 
adjacency  matrix  gives  the  set  of  adjacent  vertices  for  a  given  node.  The  indegree  (outdegree)  of  vertex  vn  is  equal  to 
the  sum  of  the  entries  in  the  nth  column  (row)  of  A 

A  set  of  matrices  {A^}  can  be  defined  where  each  row  of  A^  is  the  set  of  vertices  that  can  be  reached  by  paths  of 
cardinality  n  or  less.  Using  this  definition,  Aj=A  describes  simple  adjacent  vertices.  Let  the  operation  ®  denote 
binary  matrix  multiplication,  calculated  as  normal  matrix  multiplication  with  numerical  multiplication  and  addition 
being  replaced  by  logical  AND  and  OR  operations  respectively.  Similarly  let  ®  denote  a  matrix  logical  OR 
operation.  Then: 

An  +  l  =  An  ®  (l  ©  A),  "  >  1.  (1) 

Simply  stated,  Eq.  (l)  states  that  tc  is  reachable  from  in  with  a  path  of  cardinality  n  or  less  if  either  An_1(i,j)  is  one 
or  Ab_j(«',x)  and  A(x,/)  are  both  one,  i . e. ,  a  path  of  cardinality  n—  1  or  less  must  exist  from  t>(.  to  v?  and  an  arc  from 
vx  to  Vj  must  also  exist.  Since  all  problems  are  finite,  meaning  that  the  size  of  A  is  finite,  a  stable  result 
(A  ,  —  A  )  will  occur  for  some  finite  m.  A  is  called  the  extent  matrix  E-  it  contains  the  set  of  vertices  that  are 
reachable  from  every  node  by  any  directional  path. 

3.  DIRECTED  GRAPHS  FOR  OBJECT  CLASSIFICATION 


A  classification  space  can  be  modeled  as  a  directed  graph  by  mapping  each  class  to  a  node  in  the  graph.  If  a  wide 
discrepancy  exists  between  individual  members  of  a  given  class,  distiii  '  subsets  of  that  class  can  be  mapped  tc 
different  vertices.  (In  further  discussion  the  term  "class"  will  be  used  tn  define  the  set  of  objects  represented  by  a 
node  or  vertex,  regardless  of  whether  such  a  set  is  in  reality  a  subcl.«-~  of  a  larger  class  which  is  represented  by 


several  vertices.)  Each  vertex  has  associated  with  it  a  data  vector,  either  an  image  or  a  feature  vector,  for  the  given 
class.  The  arcs  between  vertices  are  chosen  to  show  the  similarity  or  connectedness  between  classes.  If  two  vertices 
are  adjacent,  the  classes  they  represent  should  be  more  similar  than  two  classes  represented  by  non-adjacent  vertices. 
The  primary  focus  of  directed  graph  object  classification  is  to  determine  A.  Our  primary  attention  is:  to  construct 
such  an  A  or  graph,  its  use  in  pattern  recognition,  and  the  role  for  parallel  multi-processor  optical  systems  in  such  a 
directed  graph  knowledge  base  organization  or  procedural  knowledge  or  control  system. 

Object  classification  is  achieved  by  finding  the  vertex  (node)  within  the  graph  which  best  matches  the  input  data 
vector.  The  process  could  start  by  comparing  th*  input  class  to  several  selected  vertices  in  the  graph.  The  starting 
vertex  is  the  one  which  most  resembles  the  input  data  vector.  The  data  vector  is  then  compared  to  each  of  the 
neighbors  of  this  node.  Assuming  the  starting  vertex  does  not  represent  the  input  class,  a  move  is  made  along  the 
arc  to  the  neighbor  vertex  which  is  most  similar  to  the  input  data  vector.  The  input  vector  is  then  compared  to  each 
of  the  neighbor  vertices  of  this  new  node.  This  process  continues  until  the  vertex  being  examined  is  more  similar  to 
the  input  vector  than  any  of  its  neighbor  vertices.  If  the  similarity  exceeds  a  certain  threshold,  then  the  input  belongs 
to  the  class  represented  by  that  vertex.  If  the  threshold  is  not  exceeded,  this  vertex  is  a  local  maxima.  One  then 
continues  the  search  to  find  other  higher  maxima  (using  perturbation,  i.e.  jumps  to  other  regions  of  the  graph)  If 
every  node  has  been  examined  and  no  maxima  exceeds  the  threshold,  the  input  data  io  viewed  as  a  new  class  and 
either  a  new  node  (class)  is  added  to  the  graph  or  the  graph  is  restructured  (depending  upon  memoi)  ;i  Tuitions). 

Searching  through  a  directed  graph  is  very  similar  to  traversing  a  hierarchical  tree  classifier.  All  such  algorithms 
yield  the  final  node  much  quicker  than  a  breadth-first  search  of  every  node.  The  usefulness  of  the  directed  graph 
approach  w:  discuss  is  the  increased  flexibility  of  its  structure  compared  to  that  of  a  tree.  Unlike  a  tree,  one  can 
start  concurrently  at  several  different  places  within  a  graph.  In  addition,  changing  the  starting  node  is  not  just  a 
superficial  improvement  like  jumping  to  a  lower  node  in  a  tree.  Assuming  the  graph  is  connected  and  that  each 
vertex  is  reachable,  the  whole  classification  space  can  be  searched  from  any  node,  which  is  not  the  case  for  a  tree. 
However  the  order  of  a  graph  search  can  vary  significantly,  sii.ee  it  is  stronelv  dependent  upon  the  starting  node.  If  a 
crude  estimate  can  be  made  about  the  approximate  location  of  the  unknown  input  class  within  the  graph,  starting 
nodes  can  be  picked  in  that  general  neighborhood.  This  will  greatly  reduce  the  search  time  required  to  examine  the 
whole  classification  space.  We  discuss  this  in  Section  4.4.5  and  in  Section  6. 

A  major  benefit  of  the  directed  graph  approach  is  the  ease  with  which  new  classes  can  be  included  in  the  graph. 
Adding  a  new  class  to  a  tree  is  restrictive,  since  additional  nodes  can  only  be  affixed  to  terminal  nodes  or  leaves  of 
the  tree;  otherwise  the  whole  tree  must  be  redesigned.  The  interconnections  of  a  graph,  on  the  other  hand,  can  be 
extended  to  incorporate  new  nodes  quite  easily.  Once  a  graph  is  modified  to  include  a  new  class,  classification  of 
objects  of  that  class  occurs  as  routinely  as  for  objects  in  the  original  classes.  Details  of  this  procedure  are  given  in 
Section  4. 

There  are  a  number  of  pitfalls  of  varying  severity  than  can  be  encountered  in  a  directed  graph  classifier 
procedure.  These  include: 

1.  Disconnected  subgraphs  within  the  graph.  This  could  make  proper  classification  impossible,  unless 
perturbations  are  included  (as  we  suggest  and  detail)  or  unless  the  interconnection  of  the  graph  (as  we 
detail)  are  designed  properly. 

2.  Vertices  within  a  given  subgraph  with  indegree  equal  to  zero.  The  problem  is  that  a  vertex  with  an 
indegree  of  zero  is  unreachable  from  any  other  vertex  and  could  only  be  located  if  it  was  declared  as  a 
starting  node.  The  choice  of  starting  vertices  should  include  some  of  these  nodes.  Our  graph  synthesis 
method  and  our  perturbation  step  overcomes  this  problem. 

3.  Local  maxima.  The  unknown  class  is  theoretically  reachable  from  the  starting  node,  but  is  not  found  due 
to  the  presence  of  local  maxima  in  the  maximum-ascent  approach.  Rather  than  backtracking,  we  employ 
perturbations  to  overcome  this  problem. 

4.  Circuits  (cyclic  paths)  within  A.  A  circuit  exists  whenever  a  diagonal  element  of  E  is  non-zero,  meaning 
that  a  node  is  reachable  from  itself.  Since  a  maximum-ascent  algorithm  can  never  return  to  the  same 
point  while  still  traveling  uphill,  a  circuit  is  actually  a  redundant  structure  which  can  never  be  utilized 
but  could  reduce  the  processing  speed  of  a  classifier.  For  a  completely  connected  graph,  circuits  are 


unavoidable.  Since  every  node  is  reachable  from  every  other,  a  parent  node  must  be  reachable  from  its 
neighbor  nodes.  This  requires  the  use  of  many  circuits.  These  circuits  should  be  as  long  as  possible, 
reserving  shorter  paths  for  realizable  traversals.  Shorter  circuits  will  increase  the  average  search  time 
since  they  force  more  useful  paths  through  the  graph  to  be  longer  in  length. 


Our  directed  graph  processor:  uses  perturbation,  insures  connectivity  and  reachability  and  long  circuits,  and  it 
employs  hard  decisions  (rather  than  simulated  annealing  techniques!  to  overcome  these  potential  problems.  A 
recurring  problem  in  large  class  searches  is  local  maxima.9  Our  two  solutions  to  this  problem  are  now  noted. 
Backtracking  is  included  in  our  graph  by  including  a  working  memory  with  the  prior  node  (not  taken)  with  the 
largest  correlation.  Perturbation  is  included  in  our  graph  algorithm,  by  allowing  jumps  to  new  graph  regions  or 
prior  high-correlation  nodes.  We  prefer  hard  decisions  to  simulated  annealing  (which  allows  moves  to  less  optimal 
nodes  to  occur  with  finite  probability,  depending  on  the  correlations  or  VIP  values  obtained)  to  reduce  the  search 
space  and  search  time.  The  high  threshold  r  we  employ  also  facilitates  correct  classification  (we  adjust  r  depending 
upon  the  number  of  image  pixels  and  the  amount  of  noise  expected). 

For  pattern  recognition  applications  requiring  distortion  invariance,  we  will  generally  employ  a  distortion- 
invariant  feature  space,  using  optically  generated  features.12  For  high-clutter  and  multi-object  cases,  we  will  utilize 
optical  correlators.  When  distortion-invariance  is  required  in  this  latter  case,  smart  correlation  filters  are  utilized.13 
For  more  advanced  problems,  symbolic  correlators  are  utilized.14, 13  We  emphasize  the  general  knowledge  base 
structure  and  interconnection  (hence  its  relevance  to  associative  processors,  neural  processors,  and  to  procedural 
knowledge  rules  as  well  as  implicit  declarative  knowledge  inference  machines).  We  use  the  general  term  correlation 
to  refer  to  the  use  of  the  nearest  neighbor  filters  per  graph  node  in  a  correlator  or  the  use  of  VEPs  on  input  feature 
vectors.  The  use  of  multi-class  SDF  feature  extraction  filters16  to  test  the  M  nearest  neighbors  per  node  is  not 
recommended  (for  this  large  class  case  considered)  since  unknown  (untrained)  inputs  per  node  can  produce  erroneous 
results.  Thus,  the  discriminant  vector  or  filter  used  per  node  in  the  graph  is  that  due  to  the  one  class  considered  at 
that  node  (this  filter  can  and  in  many  cases  is  a  single  class  SDF).  This  filter  choice  yields  better  hign-confidence 
results,  which  is  our  goal  (versus  simulated  annealing). 

4.  CONSTRUCTION  AND  USAGE  OF  A  DIRECTED  GRAPH  CLASSIFIER 
4.1  PARALLELISM  AND  MULTI-PROCESSORS 


In  order  to  build  a  directed  graph  classifier,  the  outdegree  M  of  each  node  must  be  selected.  M  is  often  selected 
depending  upon  the  parallelism  possible  in  the  processing  architecture.  If  A/=l,  a  search  through  the  graph  would 
be  entirely  sequential.  If  the  number  of  nodes  is  L,  the  search  time  would  then  be  on  the  order  of  LT,  where  T  is  the 
time  required  to  perform  the  one  correlation  at  each  node. 

For  cases  when  M  >  2,  the  number  of  nodes  which  must  be  searched  (A/  comparisons  per  node)  in  an  "optimal" 
complete  directed  graph  is  on  the  order  of  [log^L),  for  L>M.10  With  A/  nodes  checked  at  each  level,  an  optimal 

Z^class  classifier  will  require  x  levels,  where  Z,=23*=1  A/*=(Afr+1— 1)/( M—  I )  nodes,  i.e.  L(M—  l)=A/T+1.  Taking 
the  log m  of  both  sides  gives  log  ^L+log  l)=x+l.  Assuming  A/»l,  then  log ^A/— 1)=1  and  we  find  x=logKfL. 

This  assumes  that  the  graph  is  laid  out  such  that  every  node  can  be  reached  by  exactly  one  path  of  length  log  L  or 
less.  A  graph  which  satisfies  this  condition  from  any  set  of  starting  nodes  is  very  difficult  to  obtain.  A  graph 
efficiency  7  <  1  shall  be  defined  as  the  inverse  of  the  factor  by  which  the  actual  search  time  exceeds  the  optimal 
search  time  of  log M  L.  Thus, 


Search  time  —0(~logM  L)=0(log K1i  L) 


(2) 


7  is  a  measure  of  the  interconnectedness  within  a  graph.  Large  7  is  preferable  T*  is  very  dependent  on  the  size  and 
structure  of  the  graph,  as  well  as  the  starting  nodes  chosen.  We  expect  7  to  decrease  as  L  increases.  If  the  decrease 


is  not  too  rapid,  good  performance  will  still  result.  Graphs  with  many  short  circuits  generally  have  poor 
interconnections  and  will  have  low  values  of  7  and  longer  classification  times.  Conversely  graphs  with  few  short  path 
length  circuits  will  have  higher  7  values  and  faster  classification  times.  A  trade-off  must  be  reached  to  allow  for 
sufficient  interconnections  while  keeping  the  classification  speed  high. 

For  a  sequential  (or  single  channel  processor)  system,  the  processing  time  at  a  given  node  is  equal  to  the  time  it 
takes  to  correlate  the  input  with  the  node’s  M  neighbors,  which  is  equal  to  MT.  Therefore,  the  total  processing  time 

is  0(-MT  logM  L).  Since  L  and  T  are  constant,  this  optimum  M  is  obtained  by  minimizing  the  processing  time  with 

respect  to  M  for  M  >  2.  Assuming  7  is  independent  of  M,  we  find  the  minimum  total  search  time  for  a  sequential 
one  processor  system  when  M=  2.  This  result  is  faster  than  the  prior  A4=  1  case. 

If  a  parallel  processor  (or  multi-processor  system)  which  can  perform  N  correlations  concurrently  is  used,  the  time 
required  per  node  is  0(nT},  where  n  is  the  lowest  integer  such  that  n  >  (M/N)  and  T  is  the  processing  time  to 

perform  the  N  concurrent  correlations.  The  number  of  nodes  which  must  be  searched  is  still  0{-  log M  L).  The 

minimum  processing  time  is  found  by  minimizing  -tiT  log M  L  with  respect  to  A/,  which  occurs  when  M=N  assuming 

7  is  not  a  function  of  M.  Therefore,  optimal  classification  speed  for  parallel  multi-processors  occurs  when  the 
outdegree  A/  of  each  node  is  equal  to  the  number  of  processors  (i.e.  the  number  of  correlations  or  node  VIPs  which 
can  be  performed  concurrently  by  the  parallel  system).  We  use  the  term  correlation  to  refer  to  the  operation 
required  at  each  of  the  A/  neighbor  nodes.  This  can  be  a  vector  inner  product  (ATP)  for  the  case  of  input  features 
and  some  symbols.  It  can  be  a  2-D  correlation  for  the  case  of  iconic  (image  pixel)  input  data. 

A  similar  analysis  shows  that  the  same  value  of  M  also  represents  the  optimal  number  of  starting  or  initial 
vertices  for  a  given  architecture. 

4.2  SELECTION  OF  A /  NEAREST  NEIGHBORS 


The  construction  of  the  graph  from  initial  data  and  the  updating  of  the  graph  for  new  data  are  analogous.  For  L 
input  data  column  vectors  X(.,  their  similarity  is  described  by  the  VIP  matrix  R  with  elements  for 

«,i  <  L.  We  normalize  R  by  weighting  it  by  w  to  obtain  Rm=w'fRw,  where  w  is  a  column  vector  with  elements 

tf(«)=[Sj_j  s,(j]2]  Normalization  by  the  difference  between  the  input  data  vector  and  the  mean  data  vector  is 

also  possible.  The  weighting  by  the  inverse  of  the  magnitude  of  the  data  vector  produces  Rm  with  diagonal  elements 
equal  to  1  and  all  other  elements  less  than  one.  This  presents  a  vector  with  a  high  magnitude  from  dominating  the 
correlation  results1,  while  still  retaining  a  positive-definite  nature  matrix  for  R.  From  R^,  one  can  produce  an 
adjacency  matrix  A  with  elements 

fl  if  r(i,j)  is  one  of  the  A/  largest  elements  in  row  i  of  R  ,  i  5^  j 
«M=  m  (3) 

v0  otherwise. 

The  provision  that  t  7^  J  prevents  single  node  loops  in  A.  The  reachable  extent  matrix  E  can  then  be  determined 
using  Eq.  (1). 

From  tests,  we  find  that  A  computed  from  R^  by  Eq  (3)  alone  yields  a  well-structured  graph  of  nearest- 
neighbors,  but  is  not  necessarily  a  well  connected  graph.  This  is  especially  apparent  when  one  considers  a  multi-class 
problem  where  there  are  A/+1  very  similar  classes.  Using  the  above  procedures  alone,  these  A/+1  classes  will  form 
an  isolated  subgraph,  unconnected  from  all  the  remaining  nodes.  We  thus  use  Rm  to  assign  outgoing  nodes  and  a 
more  detailed  procedure  (detailed  below  in  Section  4.4)  to  provide  incoming  nodes  and  the  connectivity  of  the  graph. 


4.3  DEFINITIONS 


The  following  definitions  will  be  used  in  subsequent  analysis: 

1.  L  is  the  number  of  classes  currently  represented  in  the  giaph; 

2.  Lmai  is  the  maximum  number  of  classes  (nodes)  permissible  in  the  graph  It  is  upper-bounded  by  the 
memory  constraints  of  the  system; 

3.  M  is  the  maximum  outdegree  permissible  for  a;.j  node;  it  is  determined  by  the  degree  of  parallelism  in 
the  processing  architecture  (the  number  of  channels  which  can  be  processed  concurrently); 

4.  x  is  a  column  vector  representing  the  new  original  input  data; 

5.  x'  is  the  normalized  data  vector  for  the  new  input  data; 

6.  8(.,  i  <  Tmaj  is  the  normalized  data  vector  (discriminant  vector)  of  class  i  (i.e.  at  node  i); 

7.  t  is  the  acceptable  threshold  which  must  be  exceeded  for  a  match  to  occur  between  the  input  and  a  given 
class; 

8.  t>c  is  the  node  currently  being  examined; 

9.  t'L  is  a  new  node  being  appended  to  the  graph; 

10.  i  <  L  z,  j<  Af,  is  the  j-th  highest  element  in  the  t-th  row  of  Rm, 

11.  i  <  Lmax,  j<  Af,  is  the  column  number  of  the  j- th  highest  element  in  the  t-th  row  of  Rm; 

12.  /(f),  t  <  L  ,  is  the  indegree  of  t>.; 

13.  E(:,j),  i,j<  L  ,  is  the  ( i,j )  element  of  the  reachable  extent  matrix; 

14.  Z(i),  0  <  f  <  Lmax,  is  an  element  work  array  containing  the  result  of  correlations  or  \TPs  of  x' 

with  previously  stored  classes  (represented  by  s^). 

The  matrices  C  and  K  are  actually  abbreviated  versions  of  R  ^  and  A,  respectively  (containing  their  largest 
elements).  The  f-th  row  of  K  contains  the  column  numbers  j  where  a(f,j)=l.  Similarly,  the  f-th  row  of  C  contains 
the  elements  of  R  ^  corresponding  to  the  same  locations  where  1.  The  C  and  K  matrices  reduce  the  storage 

requirements  by  a  factor  of  L:  Af. 

4.4  OPERATION 


Figure  2  illustrates  the  basic  operation  of  a  directed  graph  classifier.  The  input  data  x'  is  normalized  and  (if 
required)  distortion  invariant.  An  initial  threshold  r  <  1  is  defined  to  determine  whether  an  acceptable  match  has 
been  found  .t  each  We  make  r  high  «r.ou0h  30  that  distinct  classes  will  not  be  categorized  together  and  yet 

not  so  high  that  any  noise  in  the  input  will  inhibit  proper  classification  and  force  the  graph  to  create  a  new  class. 
With  low  noise  expected,  one  should  set  r  conservatively  high.  Then,  even  minor  deviations  in  a  prior  input  will 
cause  the  graph  to  think  of  the  input  as  a  new  class.  As  the  number  of  classes  grows  and  approaches  .  the 
threshold  is  lowered,  similar  nodes  (classes)  are  combined  and  the  graph  is  restructured  This  forces  a  new 
“gmentat'on  of  the  data.  This  will  enable  the  classifier  to  adjust  r  to  the  actual  problem  set.  while  controlling  the 
number  of  nodes  in  the  graph.  The  input  data  can  be  time  sequential  scenes,  objects,  or  the  contents  of  a  knowledge 
base.  Assume  that  the  input  will  be  a  sequential  stream  of  class  data  including  noise  and  possible  distortions.  The 
steps  of  the  algorithm  for  synthesis  or  use  of  the  graph  follow. 

4.4.1  Initialization  of  the  Graph 

1.  Initialize  all  matrices  to  zero. 

2.  Preprocess  the  first  Af  input  data  vectors  x^,  yielding  xA 

3.  Since  we  started  from  a  zero-class  classifier,  these  Af  vectors  are  stored  as  the  first  Af  nodes  (for 
t  <  Af]  in  the  graph.  They  are  used  as  the  initial  starting  vertices.  L  is  set  to  Af. 

4  4.2  Classification  of  New  (Subsequent)  Input  Data  Vectors  (Iterations) 

This  iterative  procedure  applies  in  general  when  the  graph  contains  more  than  Af  nodes. 

1. 

a.  Preprocess  the  input  data  vector  to  yield  x'. 

b.  Correlate  x'  with  each  of  the  starting  vectors  in  the  graph.  Set  the  current  node  r  to  the  vector 
with  the  highest  correlation  with  x',  and  store  the  correlation  as  Z( 0)  —  max jx,7s.],  for  all  t  <  A/. 

Z( 0)  is  the  current  maximum  correlation. 


Figure  2:  Block  diagram  of  a  Figure  3:  The  addition  of  a  new 

directed  graph  classifier  node  to  a  directed  graph  classifier 


c.  Correlate  x'  with  the  M  neighbors  of  r  ,  found  in  the  matrix  K.  Store  these  results  in 

Z(K(v,i))  —  x,78 for  all  «  S  ni.  Thesi.  calculations  are  m t  excessive  of  the  neighbors 

of  the  current  node  could  be  neighbors  of  previously  searched  nodes,  in  which  case  their  correlations 
would  already  have  been  calculated  and  stored  in  Z.  Recalculation  of  them  is  not  necessary. 

d.  Look  for  the  highest  correlation  among  the  neighbors  of  r  If  this  is  greater  than  Z( 0)  then  set  Z( 0) 
equal  to  it,  set  v  to  that  node,  and  repeat  step  c. 

e.  At  some  point  Z( 0)  is  greater  than  the  correlation  at  any  of  the  neighbor  nodes.  If  Z( 0)  >  r,  then 

the  input  is  classified  as  belonging  to  the  class  represented  by  v  Classification  of  the  input  is  now 

complete  and  the  next  input  vector  can  be  classified. 

f  If  Z{ 0)  <  r,  we  recognize  v  as  a  local  maxima  of  the  graph.  In  the  case  of  construction  of  the 
graph,  we  examine  all  nodes,  using  backtracking  or  perturbations  (to  new  graph  areas  or  to  prior 
nodes  with  a  high  Z.  i.e.  perturb  or  jump  by  backtracking).  We  now  briefly  discuss  three 
techniques9  to  avoid  being  trapped  in  a  local  maxima.  They  generally  apply  to  use  of  the  graph, 
rather  than  construction  of  it. 

Back-tracking’  This  involves  going  back  to  a  previous  node  and  taking  an  alternative  rou'e.  This 
technique  can  avoid  searches  for  poor  solutions. 

Perturbation:  This  technique  permits  random  jumps  to  unsearched  nodes  of  the  graph 


Simula  >■ .  annealing:  This  is  a  non-deterministic  searching  process  which  allows  “downhill"  rather 
than  uphill"  moves  to  occur  with  a  small  (but  finite)  probability,  depending  on  the  ratio  of  the 
■'  .a  vector’s  correlation  with  t>  and  each  of  the  neighbors  of 

In  operation,  we  prefer  (in  order  of  preference)  to:  (I)  jump  to  the  next  largest  starting  node  (if  its  correlation  is 
close  to  that  of  initial  node  chosen),  (2)  jump  to  an  alternate  neighbor  of  a  prior  node,  or  (3)  perturb  to  unexamined 
areas  of  the  graph. 

These  searching  techniques  in  steps  (a)  to  (f)  continue  until  a  match  is  found  or  until  every  node  in  the  graph  has 
been  searched.  This  procedure  is  much  faster  and  easier  than  might  appear.  The  number  of  steps  required  (and 
hence  the  number  of  nodes  searched)  is  0(logKfL )  and  the  memory  is  O(L)  If  the  entire  graph  is  searched  and  no 
correlation  exceeds  r  then  a  new  node  must  be  added  to  the  tree.  The  procedure  is  outlined  in  Section  4  4.3. 

4  4.3  Addition  of  a  New'  Node 


This  step  outlines  how  a  graph  can  be  modified  to  include  a  new  class.  A  block  diagram  of  the  procedure  is 
shown  in  Figure  3  Its  steps  follow. 

1. 

a.  Increment  L,  the  number  of  classes  stored  in  the  graph,  by  one.  If  L  >  L  reorganize  the  graph 
as  in  Section  1.4  4.  If  not,  proceed  as  below. 

b.  Store  x'  in  This  is  the  data  vector  for  the  new  class,  which  will  be  represented  by  i ^  in  the 
graph 

c.  Add  M  outgoing  arcs  from  r;  If  Z{ i)  is  the  j-th  highest  element  (1  <  j<  A f)  in  Z,  then  set 
C(L.j)—  Z(i)  and  K(L,j)—i  and  increment  l(i)  by  one.  This  establishes  arcs  emanating  from  to 
its  Af  closest  neighbors  as  set  by  R  ^  These  new  neighbors  w  ill  be  referred  to  a«  forward  neighbors. 

This  establishes  the  outdegree  of  as  min(L.M). 

d  Establish  ingoing  arcs  to  .  This  step  requires  certain  precautions  to  maintain  connectivity  and 
reachability.  We  require  that  every  node  have  a  non-zero  indegree.  This  implies  that  the  sole 
ingoing  arc  to  some  node  tv  cannot  be  broken  to  establish  an  arc  to  unless  t'f  in  turn  has  to  as  a 
forward  neighbor,  re-establishing  connectivity  to  tv.  This  will  force  the  graph  to  be  connected, 
while  also  preventing  subgraphs.  We  achieve  this  in  an  ordered  manner  as  follows. 

i.  Check  all  previous  nodes  to  see  if  an  arc  should  connect  any  of  them  to  i.e.  if  t^  correlates 

well  with  a  prior  node  (better  than  some  prior  arc).  To  retain  the  graph's  symmetry,  this 

requires  that  Z(0)  •  C(i.Af)  for  some  tv.  To  guarantee  connectivity,  tv  must  still  be  reachable 

from  tv  without  the  arc  t  —  tv-,  •  . , .  (i.e.  another  wav  must  exist  to  reach  the  node  whose 
L  i  a  ( i.  A/1  ' 

ingoing  arc  was  broken  from  r  ).  Reachability  is  found  using  a  modified  A  matrix  where 
a(i  J\(i  ,.\f))—0 

ii.  If  step  i  returns  a  positive  result  for  some  tv,  the  arc  connecting  tv  to  t  ^  (  can  be  broken 

and  replaced  with  one  connecting  tv  to  v  The  reachable  extent  of  tv  and  the  connectivity  of 
the  graph  will  not  be  adversely  affected.  and  are  changed  to  Z{ t)  and  t. 

respectively.  The  i-th  row  of  C  and  K  is  now  sorted  to  accommodate  the  new  data.  This 
step  is  repeated  for  all  tv  which  apply 

iii.  If  no  ingoing  arcs  to  x  ^  are  formed  using  the  above  steps,  meaning  that  /(/-)  =  0,  w  e  must  still 
force  a  connection  This  is  most  conveniently  done  by  breaking  an  arc  from  some  other  node 
that  also  has  an  ingoing  arc  to  a  forward  neighbor  of  v  The  forward  neighbor  with  the 
highest  correlation  is  the  best  choice.  The  arc  is  then  reconnected  to  the  new  node  as 
outlined  in  step  ii  This  will  maintain  the  graph's  connectivity  at  the  expense  of  potential 
small  drops  in  the  graph’s  classification  spare  when  searching  for  particular  classes. 

The  reachable  extent  of  v ^  is  stored  in  the  L—th  row  of  E  It  is  equal  to  the  union  of  the  set  of  the 
neighbors  of  r  with  the  set  of  all  nodes  reachable  from  those  neighbors  This  means  it  is  unity  in 
any  column  j  where  I\(L.j)  1  or  F (K[L.k),j)  for  any  A  ■  I. 


e. 


4  4  4  Reorganization  of  the  Graph 


If  L  exceeds  L mai ,  the  graph  has  outgrown  the  algorithm.  The  threshold  r  must  he  lowered  so  that  new  classes 
are  not  encountered  as  frequently  and  such  that  old  prior  classes  can  be  merged.  The  following  procedure  lowers  I. 
by  one  node,  merging  several  prior  nodes  and  reorganizing  the  graph,  while  still  retaining  the  graph's  connectedness 
It  can  be  used  repetitively  until  L  <  L 

1. 

a.  r  should  be  lowered  so  that  it  is  equal  to  the  highest  value  of  the  first  column  of  C 

b.  Merge  the  node  to,  which  satisfies  C(i,])=r,  with  node  t’^.  ^ .  The  data  vectors  of  these  two  nodes 
can  be  averaged  together  to  create  a  new  discriminant  vector  representative  of  the  two  merged 
classes. 

c.  All  arcs  to  to  and  v^.  ^  are  broken  and  replaced  by  arcs  to  other  existing  nodes  This  step  is 
equivalent  to  removing  the  f-th  and  K(i,l)—th  columns  of  both  A  and  R.  This  could  potentially 
effect  the  connectivity  of  the  graph.  If  this  occurs,  the  replacement  arcs  should  be  chosen  so  that 
the  connectivity  is  re-established. 

d.  The  indegrees  of  all  the  forward  neighbors  of  v.  and  .  are  reduced  by  one.  This  removes  the 
t-th  and  K(i,l)  —  th  columns  of  C  and  K.  At  this  point,  both  tv  and  v^.  ^  are  removed  from  the 
graph. 

e.  The  merged  node  is  now  added  to  the  graph  using  the  procedure  outlined  in  Section  4. 4. 3. 

4.4.5  Multiple  Initial  Starting  Nodes  and  Meta-Verticcs 

To  improve  the  connectivity  and  reachability  of  all  nodes  in  the  graph,  meta-vertices  can  be  established.  These 
vertices  are  not  class  nodes,  but  are  used  to  connect  subgraphs  (isolated  from  the  graph).  These  nodes  slow- 
processing  and  search  time  and  are  avoided  in  our  graph  synthesis  algorithm.  We  mention  them  as  a  possibility  for 
severe  cases, 

At  the  initial  input  to  the  graph,  we  enter  the  graph  at  A/  points  (since  we  have  M  processors,  we  use  them  at  all 
levels,  i.e.  at  the  initial  level  also).  For  this  case,  meta-vertices  are  useable  (or  other  key  or  parent  vertices)  as  some 
of  the  initial  choices  for  the  Af  starting  initial  nodes. 

6.  OPTICAL  IMPLEMENTATION 


Optical  architectures  are  very  appealing  for  this  algorithm  since  they  can  easily  perform  the  feature  extraction 
and  required  correlation  operations  in  parallel.  One  architecture  to  achieve  the  A/  correlations  (or  VIPs  required  per 
node)  in  parallel  is  shown  in  Figure  4  In  this  figure,  the  preprocessed  input  data  vector  x'  is  applied  to  a  single- 
channel  acousto-optic  (AO)  cell.  The  cylindrical  lens  Ll  vertically  replicates  the  data  vector  across  the  correlation 
plane  where  a  spatial  light  modulator  (SLM).  such  as  a  multi-channel  AO  cell,  is  placed.  The  spatial  light  modulator 
contains  one  data  vector  on  each  of  its  rows.  The  projection  of  x'  onto  each  of  these  rows  produces  the  point-by-point 
product  of  every  component  of  x'  with  the  corresponding  components  of  the  data  vectors  stored  in  the  SLM 
Another  cylindrical  lens  (L2)  sums  these  products  across  each  row.  producing  the  vector  inner  product  (VIP)  of  x' 
with  each  data  vector  stored  in  the  SLM.  L2  focuses  the  correlation  results  on  a  linear  detector  array,  where  they 
are  fed  to  an  external  controller. 

The  controller  is  responsible  for  loading  the  SLM  with  the  necessary  data  vectors  to  traverse  the  directed  graph 
It  initially  loads  the  SLM  with  the  starting  vertices  of  the  graph.  It  then  detects  the  highest  output  and  assigns  r  as 
that  node.  The  neighbors  of  that  node  are  loaded  into  the  SLM.  and  the  process  continues  until  the  input  i- 
classified  as  either  an  existing  class  or  a  new  class  which  must  be  accommodated  in  the  graph.  Other  optical 
architectures  (such  as  ones  with  input  point  modulators,  a  one-channel  AO  cell,  and  N  1-D  time  integrating  detector 
arrays  are  also  viable  alternatives).  Variations  of  each  system  to  allow  high-accuracy  encoded  data  processing  are 
also  possible.  In  Figure  4,  one  would  input  an  encoded  description  of  each  element,  perform  a  high-accuracy 
multiplication  (by  convolution),  and  continue  for  the  next  vector  element. 


DET.  ARRAY 


Figure  4:  Example  Architecture  for  an  Optical  Directed  Graph  Classifier 

The  hybrid  architecture  of  Figure  4  and  its  variations  use  the  best  of  two  different  technologies:  optics  is  used  to 
handle  the  heavy  computational  burden,  while  digital  memory  provides  the  storage  of  the  data  vectors  and  the 
graph’s  A,  C  and  K  matrices.  Such  a  system  is  suitable  for  very  large  classification  problems  as  we  now  quantify. 

The  key  component  of  this  system  is  obviously  the  SLM.  As  shown,  maximum  classification  speed  for  a  parallel 
directed  graph  classifier  is  obtained  when  A/  is  set  equal  to  the  number  of  correlations  which  can  be  performed 
concurrently.  Therefore,  M  is  set  by  the  number  of  data  vectors  which  can  be  stored  in  the  SLM.  For  example, 
consider  a  16-channel  AO  cell  as  the  SLM,  with  digital  hardware  capable  of  loading  the  cell  at  a  16  Mbps  rate  (1 
Mbps  per  AO  channel).  This  would  allow  A/=16.  A  50-long  vector  with  8-bit  resolution  for  each  vector  component 
could  thus  be  passed  through  each  cell  in  0.4ms.  This  will  be  the  time  T  required  to  perform  the  parallel 
correlations.  The  controller  synchronizes  the  SLM  data  with  the  input  data.  Since  T  is  greater  than  the  propagation 
time  through  the  multi-channel  AO  cell,  the  system  performs  time  integration  in  T=0.4ms  per  node  searched. 
Assuming  a  total  of  212  classes  (L=4.096).  the  average  time  for  classification  would  then  be  0(T  log L)—  1.2ms. 
Here  we  see  that  the  penalty  for  back-tracking  is  the  addition  of  T  (a  30%  increase)  for  each  back-tracked  step.  The 
digital  memory  requirement  for  this  example  is  approximately  0.5  Mbits. 

Another  alternative  is  a  liquid  crystal  SLM,  which  presently  offer  resolution  of  about  ]  00X100  at  video  rates  (30 
Hz)  with  32  grey  levels  (5  bits/pixel).  The  processing  time  T  per  node  is  now  33ms,  which  yields  a  much  slower 
classification  time  than  the  multi-channel  AO  cell  case.  Projections  have  been  made  for  improvements  in  all  of  these 
figures,  notably  an  increase  in  its  frame  rate  to  1  kHz.  Such  improvements  would  be  necessary  to  make  liquid  crystal 
SLMs  feasible  for  such  a  system. 


0.  DIRECTED  GRAPH  CASE  STUDY 


The  algorithm  was  tested  using  standard  5X9  dot-matrix  alphanumeric  characters  in  62  classes  (’A'  through  'Z', 
’a’  through  Y,  and  ’0’  through  ’9’).  Samples  of  the  characters  are  shown  in  Figure  5.  Each  character  was  described 
by  a  64-element  binary  vector,  which  was  obtained  by  taking  each  row  of  the  character  and  making  that  the  next 
five  elements  of  the  data  vector.  The  remainder  of  the  vector  was  zero-padded.  The  number  of  forward  neighbors 
for  any  node  was  chosen  to  be  A/=4.  The  graph  was  built  one  class  at  a  time,  using  a  threshold  t  of  0.99. 

Figure  6a  illustrates  the  initial  5  class  graph  and  the  resulting  graph  when  tin-  -dxth  node  (‘F’)  was  added  to  the 
five-node  classifier.  This  was  done  by  first  adding  outgoing  arcs  from  ’F',  then  by  determining  what  arcs  should  be 
broken  to  make  ingoing  arcs  to  ’F’.  First  outgoing  arcs  were  made  from  ’F’  to  the  four  nearest  neighbors  which  had 
the  highest  correlations  with  ’F’  (in  this  case  ’A’,  ’B\  ’D\  and  ’E’).  Next,  ingoing  arcs  to  T'  were  established  by 
checking  each  of  the  five  previous  nodes  to  see  if  T”  correlated  better  than  a  given  node’s  lowest  correlation 
neighbor.  If  this  was  the  case  for  some  node  and  if  its  lowest  neighbor  was  teachable  from  F ",  then  that  arc  w.-is 


Figure  5:  Standard  5X9  Dot-Matrix  Alphanumeric  Characters 


(a)  the  five-class  graph  'A'  through  "E'  (b)  the  six-class  graph  'A'  through  T 

Figure  8:  The  Addition  of  Node  ’F'  to  the  Five-Class  Graph  (’A'  through  ’E’) 


Figure  7:  Meta- Vertices  Masks  used  (Number  of  Pixels  Per  Quadrant)  for  Initial  Input  Node  Tests 


Table  1:  Directed  Graph  for  a  Character  Data  Base 
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broken  and  replaced  with  an  arc  to  ’F’  (Figure  6b).  Table  1  shows  the  adjacency  matrix  A  (with  its  elements  noted) 
for  the  actual  graph  obtained.  Each  row  of  the  matrix  shows  the  neighbors  for  the  particular  character  in  the  left 
margin. 

A  meta-vertex  was  used  at  the  starting  node.  It  consisted  of  the  four  masks  shown  in  Figure  7,  which  simply 
counted  the  number  of  "on"  pixels  in  each  quadrant.  For  this  problem  the  optimum  number  of  nodes  to  be 
examined  (on  the  average)  for  classification  is  (16X2-f(62— 16)X3)/62=2.74,  where  examining  a  node  refers  to 
examination  of  its  M=4  nearest  neighbors.  This  value  is  simply  the  average  of  the  path  lengths  given  an  "optimal" 
graph.  The  actual  value  obtained  in  tests  on  these  data  was  5.27,  yielding  the  graph  efficiency  7=0.52.  While  the 
efficiency  may  seem  low  for  this  particular  example,  one  should  remember  that  7  reflects  the  interconnectedness  in 
the  graph,  which  is  achieved  at  the  expense  of  some  classification  speed.  More  research  is  required  to  determine  the 
effects  of  various  system  parameters  on  7.  Without  the  input  initial  meta-vertices,  performance  was  much  worse  (an 
average  of  about  8.5  nodes  per  search). 

7.  CONCLUSIONS 

An  algorithm  has  been  presented  to  model  a  large-class  problem  or  large  knowledge  base  as  a  directed  graph 
classifier.  It  is  shown  that  the  classification  procedure  can  yield  the  object  class  quite  well.  The  proposed  algorithm 
can  also  be  used  to  iteratively  synthesize  the  graph  one  class  at  a  time,  while  maintaining  the  graph  to  be  connected 
and  all  classes  reachable.  Our  algorithm  allows  the  classifier  to  easily  accommodate  new  classes,  and  it  is  especially 
suitable  for  parallel  processing  architectures,  such  as  optical  systems.  Initial  results  were  most  promising.  This 
concept  appears  to  have  use  in  many  new  optical  AI  concepts. 
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I.  Introduction 

A.  Background 

The  use  of  spatial  filters  for  the  automatic  recogni¬ 
tion  of  targets  has  been  widely  studied.  Typically, 
such  filters  are  synthesized  to  recognize  complete  ob¬ 
jects.  In  this  paper,  we  address  the  possibility  of  iden¬ 
tifying  targets  by  parts  (i.e.,  by  partitioning  the  input 
image),  and  by  symbolically  analyzing  the  partitions 
simultaneously. 

The  fundamental  idea  is  to  generate  a  symbolic  de¬ 
scription  of  the  input  imagt  using  spatial  filters  (also 
referred  to  as  correlation  filters). 1  Separate  filters  are 
synthesized  for  different  spatial  regions  of  the  compos¬ 
ite  set  of  training  images.  A  composite  filter  of  all 
objects  (with  the  spatial  relationship  between  seg¬ 
ments  preserved)  is  formed.  It  is  then  correlated  with 
the  input  to  obtain  a  symbolic  or  multibit  code  descrip¬ 
tion  of  the  input  object.  The  K-tuple  synthetic  dis¬ 
criminant  function  (SDF)  investigated  in  previous  re¬ 
search2  also  yields  a  multibit  output  code.  However 
the  prior  K-tuple  SDF  differs  from  the  scheme  pro¬ 
posed  in  this  paper  in  one  important  aspect.  Unlike 
the  correlation  filters  employed  for  symbolic  process¬ 
ing,  the  prior  K-tuple  filter  systems  are  synthesized 
from  entire  training  images.  The  advantages  of  our 
new  proposed  scheme  will  be  discussed  shortly. 

Some  relatively  simple  3-D  objects  such  as  aircraft 
can  be  numerically  modeled  on  a  computer.3  Most 
aircraft  are  a  collection  of  generic  parts  whose  dimen¬ 
sions  differ  from  model  to  model.  Computer  algo¬ 
rithms  can  efficiently  generate  the  images  of  most 
aircraft  parts  and  combine  them  to  produce  realistic 
images  of  existing  civilian  and  military  aircraft.  This 
is  possible  mainly  because  the  number  of  aircraft  parts 
is  small,  because  aircraft  have  a  consistent  set  of  gener¬ 
ic  parts  and  because  they  can  be  modeled  by  simple 
geometric  shapes  such  as  cones,  cylinders,  and  planes. 
Hence  correlation  filters  can  be  synthesized  for  various 
aircraft  parts,  and  the  target  class  be  identified  on  the 
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basis  of  those  parts  which  are  visible  in  the  input 
image. 

A  category  of  objects  such  as  tanks  is  more  difficult 
to  model  because  the  number  of  variations  in  struc¬ 
ture,  shape,  and  size  is  very  large.  Computer  pro¬ 
grams  for  modeling  tanks  exist1  but  result  in  very 
specific  models  for  each  tank.  It  is  difficult  to  obtain 
images  for  individual  tank  parts  from  computer  mod¬ 
els  and  thus  correlation  filters  synthesized  from  tank 
parts  are  not  easy  to  assemble.  In  this  paper,  we 
propose  an  alternative  scheme  based  on  spatially  par¬ 
titioning  training  set  images  ihat  serves  the  same  pur¬ 
pose  of  recognition  by  parts  for  more  complicated  ob¬ 
jects  such  as  tanks. 

B.  Practical  Motivation 

As  stated  in  Sec.  I. A,  it  is  conventional  to  synthesize 
distortion-invariant  linear  combination  correlation  fil¬ 
ters  from  complete  training  images.  However,  prob¬ 
lems  may  arise  when  parts  of  the  object  are  absent  or 
invisible  either  due  to  occlusion  by  artifacts  in  its 
natural  environment  (such  as  foliage,  terrain,  camou¬ 
flage  measures),  noise  in  the  input,  temperature  varia¬ 
tions  when  an  infrared  imaging  sensor  is  used,  sensor 
malfunction,  and  a  host  of  other  possible  reasons.  In 
situations  where  the  entire  target  is  not  visible,  it  is 
preferable  to  identify  its  observable  parts  and  from 
these  logically  deduce  its  class.  Analogously,  one  can 
determine  the  more  reliable  parts  of  the  object  and 
give  them  more  weight  than  other  parts.  Our  pro¬ 
posed  svmbolio  processor  is  motivated  by  this  set  of 
practical  considerations. 

The  inference  of  object  class  from  a  study  of  the 
visible  object  parts  requires  “abductive  reasoning.”  3 
Formally  speaking,  abductive  reasoning  involves  the 
establishment  of  pertinent  facts  to  infer  a  new  fact. 
Since  more  than  one  answer  is  often  possible,  abduc¬ 
tive  reasoning  must  also  yield  which  answer  is  the  best. 
To  make  decisions  of  this  nature,  we  must  weigh  the 
available  evidence.  To  do  this,  we  must  know  how 
strongly  a  fact  weighs  for  or  against  a  conclusion,  and 
how  to  combine  the  pieces  of  evidence  into  a  final 
conclusion.  To  gain  evidence,  it  is  necessary  to  obtain 
prior  and  conditional  (o  posteriori)  probabilities.  A 
technique  to  achieve  this  will  be  discussed  in  futher 
detail  in  Sec.  V. 

An  expert  system  is  often  defined  as  a  rule-based 
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application  program  for  performing  tasks  which  re¬ 
quire  expertise.  While  there  is  no  necessary  connec¬ 
tion  betw<  expert  systems  and  abductive  reasoning, 
most  expert  systems  perform  abductive  tasks.  Con¬ 
versely,  most  of  the  standard  examples  of  programs 
which  do  abductive  reasoning  in  the  presence  of  uncer¬ 
tainty  are  expert  systems.  With  these  considerations, 
we  can  refer  to  the  proposed  rule-based  scheme  as  an 
expert  system. 

The  definition  of  the  problem  is  given  in  Sec.  II,  and 
the  concept  of  dividing  an  image  into  partitions  is 
explained  there.  The  various  considerations  for  corre¬ 
lation  filter  synthesis  (i.e.,  their  size,  number,  output 
assignments,  and  training  sets)  are  discussed  in  Sec. 
III.  A  statistical  motivation  for  the  proposed  scheme 
is  advanced  in  Sec.  IV  along  with  illustrative  examples. 
Section  V  is  a  description  of  the  rule-based  symbolic 
processor,  and  how  expertise  and  evidence  are  incorpo¬ 
rated  into  the  program  Initial  test  results  are  report¬ 
ed  in  Sec.  VI.  A  summary  of  the  paper  is  given  in  Sec. 
VII. 

II.  Problem  Definition 

We  wish  to  design  a  system  capable  of  identifying, 
recognizing  and  classifying  objects  in  the  face  of  3-D 
distortions.  Our  case  study  is  confined  to  a  tank  and 
an  armored  personnel  carrier  (APC).  However,  the 
basic  concept  has  far  more  generality.  The  filter  is 
intended  to  achieve  aspect-invariant  distortion  invari¬ 
ance.  To  provide  this,  we  employ  training  images  (of 
the  target  objects  at  several  different  aspect  views)  as 
detailed  elsewhere.1  We  partition  these  input  train¬ 
ing  images  into  several  subimages,  and  synthesize  cor¬ 
relation  filters  for  each  partition.  The  goal  is  to  use 
correlation  filters  to  generate  a  multibit  multiple  filter 
description  (or  symbolic  code)  for  each  object  for  dis¬ 
tortion  and  shift-invariant  symbolic  object  classifica¬ 
tion. 

Once  representative  images  of  each  object  have  been 
selected  for  training  the  correlation  filters,  these  train¬ 
ing  set  images  are  partitioned  into  M  k  X  k  pixel 
subimages  or  partitions.  We  assume  an  input  object 
resolution  of  d  X  d  pixels.  Thus, 


Table  I.  Tarm»  and  Definitions  tor  Flttar  Synthesis 


Term 

Definition 

Value 

<i 

1  -D  Image  dimension 

:t2 

k 

1-L)  Partition  dimension 

8 

M 

Number  of  partitions 

lfi 

N 

Total  number  of  training  images 

12 

M  ■  kl  =  d'1.  (1 1 


We  use  the  symbol  a>„  to  denote  the  i  th  subimage  of  the 
;th  training  image.  Therefore  1  <  i  <  M  and  1  <  j  <  N. 
The  terms  partition  and  subimage  will  be  interchange¬ 
ably  used  in  this  discussion. 

We  propose  that  correlation  filters  f,  be  synthesized 
for  each  partition,  1  <  i  <  M.  The  filters  f,  are  as¬ 
sumed  to  he  functions  of  the  training  subimages  u,,  (for 
all  j)  and  to  be  of  dimensions  k  X  k.  The  correlation 
filter  synthesis  procedure  is  not  important  for  the  dis¬ 
cussion  in  this  paper.  We  use  minimum  average  corre¬ 
lation  energy  (MACE)6  filters  in  our  work  because  of 
their  time  and  memory  efficient  synthesis,  jnu  their 
ability  to  form  good  correlation  peaks. 

III.  Criteria  for  Filter  Synthesis 

In  this  section,  we  discuss  relevant  synthesis  criteria 
such  as  the  designation  of  filter  outputs  and  the  selec¬ 
tion  of  training  sets  for  the  filters.  The  proposed 
scheme  is  best  described  by  means  of  the  diagram  in 
Fig.  1. 

We  use  sixteen  partitions  (M  =  16)  in  our  work.  The 
outputs  from  the  corresponding  sixteen  filters  are  col¬ 
lectively  denoted  by  the  16-element  output  vector  v. 
The  layout  is  shown  in  Fig.  1.  The  image  is  divided 
into  sixteen  subimages,  each  of  which  is  a  partition. 
The  partitions  are  numbered  from  1  to  16  as  in  Fig.  1. 
The  training  set  of  the  filter  f,  (1  <  i  <  16),  correspond¬ 
ing  to  the  ith  partition  is  simply  the  collection  of  the  ith 
subimages  in  all  complete  images  in  the  data  base. 
The  training  set  for  the  ith  filter  is  represented  by  <$>,  = 

W,jJ  ~  l,.  ..N\. 

The  data  base  chosen  for  our  work  consists  of  six 
complete  images  of  the  tank  and  six  images  of  the  APC. 
The  images  were  taken  at  a  depression  angle  of  60°  and 
were  evenly  spaced  every  60°  about  the  normal.  Since 
there  were  six  images  per  class,  the  training  set  </>,  for 
each  filter  f,  included  2X6  =  12  subimages  u),;,  1  <  j  < 
24.  The  data  base  images  were  32  X  32  pixels  (i.e.,  d  = 
32).  Since  M  =  16,  we  select  k  =  8  to  satisfy  Eq.  (1). 
The  four  synthesis  parameter  values  for  d,k,M,  and  N 
are  listed  in  Table  I.  The  entire  data  base  contains 
seventy-two  images,  thirty-six  of  the  tank  and  thirty- 
six  of  the  APC,  each  image  being  a  different  aspect 
view  with  10°  increments  in  aspect  angle  used. 

The  desired  filter  outputs  must  also  be  specified  for 
both  classes  of  data.  Two  choices  for  the  filter  outputs 
are  shown’in  Figs.  2(a)  and  (b).  These  were  used  for 
the  tank  and  the  APC,  respectively.  The  value  (1  or  0) 
in  each  square  in  the  correlation  output  represents  the 
output  of  the  corresponding  partition  of  the  filter. 
Thus,  as  seen  in  Fig.  2(a),  the  sixteen  filters  f,  yield  an 
output  of  1  for  odd  valu.o  „f  i  (and  0  otherwise),  when 
the  input  image  is  a  tank.  This  output  vector  for  the 
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I*  ig.  (a>  Partitioned  output  pattern  tor  tank;  0)1  partitioned 
output  pattern  lor  an  A  PC'. 

tank  is  denoted  by  Vi,  which  is  obtained  by  lexico¬ 
graphically  ordering  the  elements  of  Fig.  2(a).  Simi¬ 
larly,  Fig.  2(b)  shows  the  desired  outputs  for  an  APC 
input.  The  corresponding  output  vector  is  denoted  by 
v_>.  If  the  filter  outputs  are  set  to  be  v,  or  v->  for  each 
target  class  for  all  images  in  the  data  base,  the  output 
vectors  V!  and  v2  are  invariant  to  3-D  distortions  of  the 
targe*0  to  he  classified.  During  filter  synthesis,  we 
specify  that  the  training  set  objects  have  these  two 
output  patterns.  Thus,  we  achieve  a  unique  16-bit 
symbolic  correlation  output  description  for  each  input 
object. 

The  Fourier  transform  of  the  filter  f  (with  Af  =  16 
outputs  as  shown  in  Fig.  2  in  the  space  domain)  is 
synthesized  as  a  matched  spatial  filter  in  the  frequency 
domain  of  a  frequency  plane  correlator.7  This  pro¬ 
duces  one  filter  with  each  of  its  Af  =  16  partitions  on  a 
different  spatial  frequency  carrier  (with  frequency 
proportional  to  the  subimage’s  location  in  the  filter). 
The  correlation  output  for  such  a  filter  yields  a  4  X  4 
array  of  correlation  values  (a  16-bit  symbol)  for  each 
occurrence  of  a  tank  or  APC  in  the  input.  The  symbol¬ 
ic  pattern  of  Fig.  2(a)  will  result  when  the  input  is  a 
tank  and  the  pattern  in  Fig.  2(b)  will  result  when  the 
input  is  an  APC.  The  spatial  location  of  the  pattern 
denotes  the  object’s  position  in  the  input  image  plane. 
Thus,  one  uses  such  a  correlator  in  the  conventional 
manner  but  searches  the  correlation  plane  for  specific 
4X4  symbolic  patterns,  descriptive  of  different  ob¬ 
jects. 

IV.  Statistical  Motivation 

A  statistical  motivation  for  the  proposed  scheme 
may  be  gained  from  the  following  considerations. 
Typically,  the  pattern  recognition  schemes  that  use 
correlation  filters  have  a  high  false  alarm  rate.  The 
problem  of  false  alarms  has  not  been  addressed  fully 
and  is  an  important  topic  for  future  research.  In  this 
paper,  we  briefly  describe  a  potential  solution  to  the 
problem,  but  will  defer  the  details  of  the  analysis  to  a 
future  publication. 


Consider  a  single  correlation  filter  employed  for  tar¬ 
get  recognition.  When  the  correct  object  is  present  at 
the  input,  the  output  correlation  peak  is  at  a  user- 
specified  value.  This  is  true  provided  the  filter  is 
distortion  invariant  (one  approach  to  this  is  to  make 
the  proper  choice  of  the  training  set  images).  The 
value  of  the  output  peak  determines  the  class  of  the 
input  image.  Unfortunately,  it  can  be  show;,  that  an 
infinite  number  of  images  exist  that  yield  correlation 
peak  outputs  exactly  equal  to  those  specified  during 
filter  synthesis  by  the  user.  Thus,  even  in  the  absence 
of  any  target,  the  filter  may  output  correlation  values 
equal  to  or  close  to  those  specified  for  targets  and 
thereby  give  rise  to  false  alarms.  Decisions  based  on  a 
single  filter  are  hence  unreliable.  In  formal  terms,  the 
constraints  imposed  during  filter  synthesis  are  neces¬ 
sary  but  not  sufficient  for  target  recognition. 

It  can  be  shown  that  the  simultaneous  use  of  more 
than  one  filter  reduces  the  false  alarm  rate.  The  si¬ 
multaneous  use  of  multiple  filters  (such  as  the  K-tuple 
SDF)  has  been  suggested  in  previous  research  (al¬ 
though  not  for  these  specific  reasons).  Our  present 
scheme  based  on  partitioned  images  achieves  a  lower 
false  alarm  rate  because  more  constraints  have  to  be 
satisfied  simultaneously.  As  stated  earlier,  we  do  not 
provide  a  detailed  analysis  in  this  paper.  However,  we 
now  offer  intuitive  insight  into  the  problem  and  its 
solution. 

In  the  following,  we  shall  represent  a  d-dimensional 
vector  space  S  and  its  subsets  S,  by  plane  figures  as  in 
Fig.  3.  The  plane  S  represents  the  whole  set  of  possi¬ 
ble  images  that  could  ever  appear  at  the  input  of  the 
correlator.  Assume  that  a  filter  fi  is  synthesized  such 
that  an  output  of  u,  is  obtained  whenever  the  target  is 
present  at  the  input.  Since  images  other  than  the 
target  exist  that  yield  an  output  uh  we  denote  the 
subspace  of  all  such  images  by  the  region  St.  Thus  all 
images  inside  Si  are  potential  sources  of  false  alarms 
with  the  filter  fp  Now  assume  that  we  employ  Af 
filters  f,,  1  <  i  <  Af.  For  each  filter  {,,  there  exists  a 
subspace  of  images  S,  (similar  to  Si)  that  yield  false 
alarms.  All  images  in  S,  thus  satisfy  the  constraints 
imposed  on  the  filter  f,  during  synthesis.  The  Af  sub¬ 
spaces  S,  for  the  Af  filters  are  shown  in  Fig.  4.  By 
definition,  all  these  subspaces  must  contain  the  train¬ 
ing  set  images,  and  hence  must  have  a  nonzero  inter¬ 
section  I.  Moreover,  an  image  must  belong  to  this 
intersection  to  simultaneously  satisfy  all  Af  filters. 
For  a  multifilter  system,  a  false  alarm  is  said  to  occur  if, 
in  the  absence  of  a  target,  all  Af  filters  output  correct 
correlation  values.  Therefore  for  a  false  alarm  to  oc¬ 
cur  with  multiple  filters,  the  input  must  yield  Af  cor- 


1  1  1  2  1  3  1 

4  1 

1  -  t  t  -  1 

l  ■■>  t  l>  1  '  i 

L8 ! 

|  0  J  10  1  11 

[  .3.U  .UJ 

1  12  | 
IjoJ 

Kig.  it.  I  )om;»in  .s',  of  i:larm  images  in  the  space  N  of  all  images 
<  t«'f  the  ease  of  a  single  f  ilter). 


15  November  1967  /  Vol.  26.  No.  22  /  APPLIED  OPTICS 


4797 


rect  outputs  simultaneously.  Only  images  in  the  in¬ 
tersection  region  l.ave  this  property  (since  images  in  I 

bv  definition  yield  correct  outputs  for  f ( .f^ . f/u). 

Thus  images  that  cause  false  alarms  with  M  filters 
must  belong  to  the  intersection  set  I.  From  Fig.  4  it  is 
evident  that  the  number  of  false  alarms  is  less  for  M 
filters  (than  for  any  single  f,)  since  the  intersection  7  is 
smaller  than  any  of  the  individual  subspaces  S,. 
Moreover,  the  intersection  becomes  smaller  as  the 
number  of  filters  (and  hence  the  number  of  subspaces 
that  must  intersect)  increases,  indicating  a  diminish¬ 
ing  false  alarm  rate  for  a  larger  number  of  image  parti¬ 
tions. 

The  information  in  Fig.  4  can  be  interpreted  in  terms 
of  the  probability  of  false  alarms.  It  can  be  shown,  in 
rather  general  conditions,  that  a  system  using  filters 
synthesized  from  complete  images  (without  partition¬ 
ing  the  data)  has  a  higher  probability  of  false  alarm 
than  a  system  employing  multiple  filters.  The  sym¬ 
bolic  and  associative  postprocessing  we  perform  allows 
flexibility  in  assigning  objects  to  a  class  when  the  inter¬ 
section  region  /  in  Fig.  4  becomes  too  small  for  a  given 
set  of  data. 

V.  Probabilistic  Rule-Based  Recognition 

In  this  section,  we  describe  criteria  for  basic  rule 
formulation  for  the  recognition  of  targets  using  the 
output  symbolic  vectors  v,,.  Guidelines  are  provided 
for  incorporating  new  rules  into  the  system,  via  inter¬ 
active  exchange  of  information.  The  criteria  for  as¬ 
signing  confidence  measures  (probabilities)  to  each 
decision  are  also  discussed. 

We  wish  to  determine  the  conditional  probability  P 
(Tank/I v|  >  11  ( i.e. ,  the  probability  that  the  input 
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image  is  a  tank  given  the  observation  v  where  T  is  a 
threshold  value).  A  purely  statistical  solution  to  the 
problem  would  be  to  obtain  estimates  for  P(|v|  >  T) 
and  to  use  Bayes  rule8  to  obtain  an  estimate  for 
P(Tank/|v|  >  T),  assuming  a  priori  probabilities  for 
P(Tank).  However,  it  is  generally  difficult  to  obtain 
all  the  necessary  estimates  for  P(Iv|  >  T),  because  of 
the  large  number  of  possibilities.  Thus  we  resort  to 
abductive  reasoning  to  provide  a  solution. 

Given  a  measured  output  vector  v,  the  system  deter¬ 
mines  a  limited  number  of  ways  in  which  the  observa¬ 
tion  could  have  resulted  from  image  distortions,  miss¬ 
ing  parts,  etc.,  and  the  probability  associated  with 
each.  The  system  then  uses  abductive  reasoning  to 
determine  possible  output  filter  element  errors.  Once 
a  filter  output  is  suspected  of  error,  its  symbolic  value 
is  altered  to  test  for  better  matches  with  the  descrip¬ 
tions  stored  in  memory.  During  system  test  runs,  we 
develop  an  a  priori  belief  in  specific  filter  outputs  by 
observing  that  some  filter  outputs  are  in  error  less 
frequently  than  others.  In  operation  the  system  is 
then  instructed  to  examine  these  more  reliable  filter 
outputs  in  certain  conditions  and  to  ignore  other  sym¬ 
bolic  outputs.  The  decisions  made  in  such  conditions 
(i.e.,  ignoring  certain  symbols)  are  assigned  a  lower 
confidence.  We  now  detail  these  techniques. 

A.  Rule  Formation  Introduction 

Target  recognition  is  a  trivial  task  if  the  input  image 
is  represented  in  the  data  base.  In  this  case,  the  out¬ 
put  vector  is  expected  to  exactly  match  the  16-bit 
patterns  in  Fig.  2(a)  or  (b).  We  will  refer  to  the  proper 
output  vector  (vt  or  v>)  simply  as  the  output  vector  v. 
A  simple  rule  for  target  recognition  in  this  case  is: 
Rule  1: 

(1)  Assign  the  symbol  A  to  the  symbolic  outputs 
that  are  1,  and  t  le  symbol  B  to  outputs  that  are  0. 

(2)  If  elements  (1,5,9,13)  and  (3,7,1 1,15)  of  v  are  A 
and  elements  (2,6,10,14)  and  (4,8,12,16)  are  B,  the 
input  is  a  tank  with  confidence  =  1.0. 

(3)  Else,  if  the  complement  of  2  is  true,  the  input  is 
an  APC  with  confidence  =  1.0. 

(4)  Else,  set  error  flag  (1)  and  confidence  =  0.0. 

End  rule  1. 

This  rule  operates  on  the  output  vector  v.  We  treat 
the  outputs  1  and  0  as  symbolic  values  and  assign  them 
the  symbols  A  and  B.  The  complement  rule8  in  step 
(3)  evaluates  the  complement  of  the  rule  in  step  (2).  If 
v  does  not  satisfy  the  rule,  this  is  a  procedure  error  and 
the  flag  in  step  (4)  is  used  to  record  this  fact.  Sec.  V.C 
provides  further  rules  and  how  they  are  learned. 

B.  Multiple  Filter  Banks 

In  a  real  environment,  it  is  unlikely  that  input  im¬ 
ages  will  perfectly  match  any  image  in  the  data  base, 
since  input  images  can  be  distorted  by  3-D  rotations  of 
the  target  or  by  occlusion  of  target  parts  by  natural  and 
man-made  artifacts.  Our  processor  adapts  to  such 
situations  as  we  now  describe. 

To  improve  the  decision  making  process,  we  employ 
a  set  of  S  symbolic  filters  (with  M  partitions  in  each). 
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We  refer  to  the  set  of  filters  as  a  filter  bank.  We  thus 
perform  the  correlation  of  the  input  data  with  S  filters. 
For  a  single  input  object,  there  will  be  5  output  correla¬ 
tion  planes  and  each  will  contain  an  M  element  output 
vector  v,  (the  symbolic  pattern  chosen). 

Figure  5  shows  a  correlator  with  S  =  4  multiple 
correlation  planes  at  P3  that  are  the  correlation  of  the 
P i  data  with  S  =  4  different  spatially  multiplexed 
filters  at  P>.  The  holographic  optical  element  (HOE) 
L\  provides  a  spatial  replication  of  the  Fourier  trans¬ 
form  of  the  P,  data  at  four  separate  locations  in  P_>. 
Four  space-multiplexed  filters  with  HOE  Fourier 
transform  lenses  are  used  at  P:.  One  can  also  achieve 
multiple  correlations  using  frequency-multiplexed  fil¬ 
ters  at  P:  as  show-n  in  Fig.  6.  In  both  architectures, 
each  correlation  plane  contains  a  4  X  4  spatial  pattern 
(the  symbolic  code  chosen,  such  as  those  in  Fig.  2)  at 
spatial  locations  corresponding  to  each  occurrence  of 
one  of  the  objects  in  the  Pi  data. 

In  our  initial  symbolic  processor  tests,  we  used  S  =  3 
filter  banks  with  M  =  16  filters  in  each.  Each  object  is 
thus  described  by  three  vectors  vM,  v,  „  v,  ,  with  a  total 
of  3  X  16  =  48  elements.  Figure  6  shows  the  second 
output  vectors  (v,„l)  and  (v,_„2)  for  the  class  1  and  2 
objects  and  the  third  output  vectors  (v„,,l)  and  (v,  ,,2) 
chosen.  Each  vector  pair  is  a  vector  and  its  comple¬ 
ment.  The  advantage  of  using  a  filter  bank  is  that  an 
error  in  one  output  vector  can  be  confirmed  (or  invali¬ 
dated)  using  the  remaining  S  -  1  output  vectors.  We 
now  describe  how  rules  were  developed  interactively  to 
achieve  this. 

C.  Interactive  Knowledge  Acquisition 

In  each  object  class,  thirty-six  images  (at  10°  aspect 
increments)  exist.  The  filters  f,  for  the  various  filter 
banks  were  formed  from  six  images/class  (at  60°  incre¬ 
ments  in  aspect),  i.e.,  using  twelve  of  the  seventy-two 
possible  images  in  the  2  classes  (tank  and  ARC).  The 
three  filter  banks  were  formed  and  encoded  as  in  Figs. 
2  and  7.  The  three  filter  output  vectors  v,  to  v,  t 
obtained  were  measured  and  stored.  Although  the 
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idea!  symbolic  patterns  contain  ones  and  zeros,  the 
actual  filter  outputs  are  values  between  0  and  1  (partial 
truth). 

The  program  first  attempts  to  classify  the  output 
vectors  using  rule  1  (Sec.  V.A)  applied  to  all  three 
vectors  vM,  v,  ,  and  vM.  For  a  decision  to  be  made,  all 
three  output  vectors  must  satisfy  rule  1.  If  a  decision 
is  not  possible,  it  is  assumed  that  errors  have  occurred 
in  the  vectors  that  failed  rule  1 .  The  user  is  interrogat  - 
ed  for  the  class  of  the  input  image.  The  three  output 
vectors  and  the  user’s  choice  for  object  class  are  stored. 
The  program  proceeds  in  this  manner  until  this  infor¬ 
mation  has  been  obtained  on  all  seventy-two  images. 
After  storing  the  three  output  vectors  and  the  user- 
specified  class  for  all  test  images,  the  program  interro¬ 
gates  the  user  for  the  number  of  rules  that  should  be 
used  for  decision  making.  An  iterative  search 10  is  then 
initiated  to  find  these  rules,  such  that  the  number  of 
errors  obtained  using  each  rule  is  as  small  as  possible. 
We  now  detail  this  procedure. 

To  illustrate  this  procedure,  consider  the  tank  im¬ 
ages  as  inputs.  It  is  found  that  for  thirty  of  the  thirty- 
six  tank  images  (i.e.,  all  nontraining  set  images),  the 
fourth  element  of  vector  vM  is  in  error  [i.e.,  it  should  be 
0  as  shown  in  Fig.  2(a),  but  was  1).  It  is  also  found  that 
for  the  same  thirty  test  images,  the  seventh  and  eighth 
elements  of  v, ,  and  the  twelfth  element  of  vM  are  in 
error.  Therefore  a  possible  second  rule  is: 

Rule  2:  failing  rule  1  then 

(1 )  If  all  elements  of  v\,  match  except  for  element  (4) 
and  v, ,  matches  except  for  elements  (7,8)  and  v, , 
matches  except  for  element  (12),  the  input  is  a  tank 
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with  probability  =  0.86  (confidence  =  0.79). 

(2)  Else:  if  the  complement  (of  step  1)  is  true,  the 
input  is  an  AFC  with  probability  =  0.80  (confidence 
=  0.73). 

(3)  Else,  set  error  flag  (2)  and  confidence  =  0.0. 

End  rule  2. 

Using  this  rule,  it  was  found  that  thirty-one  out  of 
the  thirty-six  tank  images  satisfied  the  match  require¬ 
ments,  and  hence  were  correctly  identified  (while  none 
of  the  APC  test  images  satisfied  the  rule).  Thus  the 
probability  that  an  image  that  satisfies  rule  2  is  a  tank 
is  31/36  =  0.86.  This  is  how  the  probability  values 
noted  in  steps  (1)  and  (2)  in  rule  2  were  obtained.  If 
the  match  technique  fails  for  the  tank,  the  complemen¬ 
tary  rule  is  evaluated  for  APCs  as  in  step  (2).  It  was 
found  that  twenty-nine  APCs  satisfied  the  comple¬ 
mentary  rule,  and  thus  the  probability  for  APCs  is 
estimated  to  he  29/36  =  0.80  as  noted  in  step  (2). 

It  is  necessary  to  distinguish  between  confidence 
and  probability  measures.  As  the  number  of  the  rule 
used  increases,  more  and  more  vector  elements  are 
ignored.  Since  fewer  symbols  are  taken  into  consider¬ 
ation,  the  confidence  in  higher  rule  must  be  lower. 
However,  the  probability  that  higher  rules  are  satisfied 
is  larger,  because  fewer  vector  elements  are  used  for 
making  a  decision.  Thus  we  need  to  compensate  by 
including  the  number  of  elements  examined  in  the 
expression  for  the  confidence  This  is  easily  done  In¬ 
setting 

...  ......  ivMilw-r  ni  element*,  rxamint-il 

(cnlideme  =  prulinmhtv  X  •  i.’t 

total  number  i-l  elements 

For  rule  1.  we  use  a  confidence  of  1.0,  since  if  it  is 
satisfied,  we  have  perfect  confidence  (ignoring  the  pos¬ 
sibility  of  false  alarms)  in  the  class  estimate  it  gave. 
For  rule  2,  the  confidence  from  (2)  is  0.86(44/48)  =  0.79 
and  0.80(44.  48)  =  0.73  for  the  tanks  and  the  APCs, 
respectively.  Thus  for  low  numbered  rules  (using 
more  of  the  vector  elements),  the  confidence  is  approx¬ 
imately  equal  to  the  probability  that  the  rule  is  satis¬ 
fied  (since  the  number  of  symbols  used  for  decision 
making  is  close  to  the  total  number  of  symbols).  How¬ 
ever,  for  higher  numbered  rules,  the  confidence  is  a 
fraction  of  the  probability,  reflecting  the  fact  that 
some  information  was  ignored  in  making  the  decision. 
We  used  five  rules.  The  data  for  these  are  provided  in 
Table  II  for  rules  1-3  and  'heir  complements  lc-5c. 
The  confidence  of  each  rule  decreases  as  expected  and 
the  number  of  symbolic  elements  (out  of  forty-eight) 
ignored  increases  as  shown. 

The  procedure  failing  rule  noted  at  the  start  of  rule  2 
checks  the  error  table  to  see  if  a  given  rule  was  violated 
by  the  output  vectors.  This  is  required  for  determin¬ 
ing  branch  and  termination  conditions  and  is  particu¬ 
larly  useful  in  programs  with  intricate  feedback  routes. 
Since  our  rules  have  a  precedence  hierarchy,  the  proce¬ 
dure  failing  rule  is  not  absolutely  necessary  for  our 
present  program  execution.  However,  we  included  it 
to  accommodate  the  future  development  of  the  pro¬ 
gram  into  a  more  complex  rule-based  algorithm.  Note 
that  if  any  one  of  the  S  output  vectors  does  not  satisfy  a 
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particular  rule,  the  condition  for  failure  is  set. 

Using  rule  1  before  rule  2  establishes  a  hierarchy  for 
rule  usage.  If  an  image  is  found  to  satisfy  rule  1,  it  is 
easily  classified  as  either  a  tank  or  an  APC.  However, 
most  images  may  not  satisfy  rule  1,  and  rule  2  must  be 
applied  as  a  second  test.  This  rule  is  estimated  to  be 
correct  86%  of  the  time  for  tanks  and  80%  of  the  time 
for  the  APCs.  This  percentage  probability  that  the 
rule  is  satisfied  is  then  used  in  Eq.  (2)  to  obtain  a 
confidence  measure.  All  images  satisfying  rule  1  will 
satisfy  rule  2  also.  The  purpose  of  the  interactive 
procedure  leading  to  our  five  rules  is  to  determine  the 
most  reliable  symbols  and  the  probabilities  and  confi¬ 
dence  of  the  class  estimates  for  each  rule.  This  general 
technique  to  obtain  the  rules  to  be  used  results  in  a 
final  set  of  rules  that  is  a  decision  tree.  The  technique 
used  to  select  the  symbols  used  at  successive  levels  is 
genera)  and  can  be  applied  to  many  problems.  It  is  not 
domain  specific.  The  specific  rules  that  result  will 
differ  for  each  data  set.  Thus  the  method  adapts  to 
different  knowledge  sources. 

We  now  discuss  rules  that  use  the  information  in  one 
output  vector  to  rectify  errors  in  the  others  using  a  new- 
symbolic  substitution  rule.  For  example,  suppose 
that  the  fourth  element  of  the  vector  vM  is  in  error  for  a 
particular  input  image.  The  program  assumes  that 
the  part  of  the  image  in  the  fourth  partition  is  missing, 
or  is  severely  distorted.  Therefore,  it  assumes  that  the 
fourth  elements  of  vectors  v,  and  v\  are  also  in  error 
(since  the  replicas  of  the  same  image  are  input  to  all 
filter  banks).  This  rule  module  then  alters  the  fourth 
elements  of  v\  and  v, ,  and  checks  to  see  if  use  of  the 
original  or  altered  v„,  and  v, ,  vectors  yields  a  better 
match.  Both  possibilities  are  considered,  since  if  the 
proper  element  value  is  0,  it  may  not  be  altered,  where¬ 
as  if  its  proper  value  were  1,  it  may  be  altered;  or  vice 
versa,  depending  on  the  nature  of  the  difference  in  the 
corresponding  region  of  the  input.  If  altering  the  out¬ 
put  vectors  in  this-manner  provides  a  better  match,  the 
assumption  that  a  part  of  the  image  is  distorted  or 
missing  is  validated.  The  input  image  is  then  classi¬ 
fied  appropriately.  In  principle,  this  symbolic  substi¬ 
tution  can  be  applied  to  more  than  one  element  of  the 
output  vectors.  This  rule  module  would  be  applied  to 
each  rule  and  then  (if  no  match  is  obtained)  the  next 
rule  would  be  accessed.  A  straightforward  procedure 
can  be  devised  to  identify  which  elements  of  the  output 
vectors  may  be  in  error.  A  major  advantage  of  a  rule- 
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based  recognition  technique  is  the  ability  to  anticipate 
and  correct  errors  before  a  decision  is  made.1" 

The  rules  (see  Table  II)  with  highest  confidence  are 
invoked  first,  since  a  later  rule  provides  a  lower  confi¬ 
dence  than  the  previous  rule.  \Ve  emphasize  that  the 
process  of  rule  generation  is  an  off-line  interaction 
between  the  programmer  and  the  computer.  Once  the 
set  of  rules  is  formulated,  the  program  stores  them  in 
the  memory  for  on-line  access.  The  technique  used 
for  generating  the  rules  attempts  to  maintain  a  hierar- 
chv,  such  that  images  that  satisfy  rules  with  higher 
confidence  will  also  satisfy  rules  with  lower  confidence. 
This  occurs  in  the  present  case.  In  general,  most  im¬ 
ages  will  satisfy  more  than  one  rule.  The  decision  with 
the  highest  vote  of  confidence  is  accepted  as  the  best 
choice  for  image  classification. 

D.  Associative  Memory 

If  the  confidence  of  the  lowest  rule  with  a  match  is 
felt  to  be  too  small,  the  rule-based  decision  making  is 
deemed  to  be  unreliable.  In  our  five-rule  system,  we 
always  have  a  confidence  of  at  least  0.45.  However, 
this  will  not  be  sufficient  in  most  cases.  Use  of  rules 
beyond  rule  3.  where  the  confidence  drops  below  70%. 
will  generally  not  result  tn  acceptable  performance. 
In  such  situations,  the  program  resorts  to  matching  the 
v.  vectors  to  the  closest  ideal  output  set  of  vectors  (by 
minimizing  the  norm  of  the  difference  between  the  two 
vectors).  This  is  analogous  to  the  information  retriev¬ 
al  process  in  an  associative  memory.  Thus,  we  call  an 
associative  processor  one  that  returns  three  vectors 
closest  to  the  computed  vectors  v.  ,  v.  .  and  v.  We 
could  use  .S'  separate  associative  processors  (one  for 
each  of  the  S  output  vectors)  to  reduce  crosstalk  or 
interference  between  symbols  With  only  sixteen  ele¬ 
ment  vectors,  there  is  considerable  crosstalk.  At 
present,  we  employ  one  autoassociative  processor  that 
handles  all  six  vectors  I  three  per  class  object). 

The  design  of  associative  memory  processors  is  dis¬ 
cussed  elsewhere1 1  and  is  not  reviewed  here.  The  au¬ 
toassociative  memory  matrix  is  given  by  the  Moore- 
I’enrose  generalized  inverse 

M  *  X.X’X.  X  .  > 

where  the  six  columns  of  the  matrix  X  are  the  three  v, 
vectors  for  each  class.  The  output  vector  that  results 
will  be  a  minimum  mean-square  approximation  to  the 
ideal  data.  While  most  errors  in  the  input  vector  are 
corrected  by  such  associative  processing,  a  few  correct 
symbol  values  may  be  altered  ( i.e. .  errors  can  be  intro¬ 
duced  by  the  associative  processor)  to  achieve  the 
minimum  error  value.  The  error  correcting  capability 
of  the  associative  processor  depends  on  the  size  of  the 
vector  space,  and  the  number  of  vectors  stored,  and  is 
^etter  for  higher  dimensional  input  vectors.  In  our 
case,  the  dimensionality  of  one  input  vector  is  lf>, 
which  is  relatively  low.  Thus  dramatic  improvements 
are  not  expected  We  could  employ  the  three  vectors 
as  one  45-element  input  vector  and  thereby  improve 
the  performance  of  the  associative  processor.  Howev¬ 
er.  our  present  purpose  is  not  for  the  associative  pro¬ 


cessor  to  fully  correct  the  input  vector,  but  for  the 
combination  of  an  associative  processor  and  our  rule- 
based  symbolic  processor  to  be  used.  Memory  size 
and  performance  studies  will  determine  the  best  sym¬ 
bolic  vector  dimensionality  to  be  employed.  The  out 
put  vector  obtained  from  the  associative  processor  we 
used  is  thresholded  at  0.5  to  obtain  binary  valued 
symbolic  vector  elements.  These  resulting  vectors  are 
then  fed  to  our  rule-based  processor,  which  is  then 
checked  for  an  improvement  in  the  confidence  of  the 
class  estimate.  An  improvement  is  not  always  guaran¬ 
teed  since  the  associative  processor  can  change  correct 
symbols  also  as  noted  at  the  outset. 

VI.  Initial  Test  Results 

W  e  now  discuss  the  initial  performance  of  our  rule- 
based  symbolic  processor.  A  bank  of  three  filters  w  as 
formed  with  symbolic  outputs  for  2  classes  as  shown  in 
Figs.  2  and  7  from  six  images  per  class  of  aspect - 
distorted  tanks  and  APCs.  A  set  of  five  rules  for  our 
rule-based  system  was  produced.  Rules  1  and  2  were 
presented  earlier  in  Secs.  V.A  and  V.C.  Subsequent 
rules  were  obtained  similarly  by  noting  which  symbolic 
elements  were  generally  in  error.  Table  II  summarizes 
the  confidence  for  each  rule  for  each  object  class.  The 
confidence  is  obtained  as  detailed  in  Sec.  V.C  and  it  is 
seen  to  decrease  for  subsequent  rules.  This  is  expect¬ 
ed  since,  with  fewer  symbols  used  in  subsequent  rules, 
we  expect  lower  confidences  in  the  class  estimates 
produced.  The  rules  were  then  applied  to  the  training 
set  images,  and  100‘7  correct  results  were  obtained 
(with  confidence  1 .01  as  expected  (see  Table  HI).  This 
confirmed  the  proper  synthesis  of  the  symbolic  filters. 

The  system  was  then  tested  with  five  images  per 
class  (only  one  of  these  images  per  class  was  a  training 
image.  theO°  view)  with  partitions  7  and  10  (see  Fig.  1 ) 
of  each  image  removed  to  simulate  data  occlusion  and 
to  test  the  system's  performance.  Table  IV  shows  the 
results.  The  first  three  columns  in  Table  IV  give  the 
test  number,  the  aspect  view,  and  the  type  of  object. 
The  last  three  columns  show  the  results  obtained,  the 
class  estimate,  the  rule  number  which  the  object  first 
passed,  and  the  confidence  of  the  rule  (and  hence  the 
confidence  of  the  class  estimate).  As  seen,  one  error 
was  obtained  (for  the  class  estimate  in  test  4).  The 
confidence  of  this  estimate  is  low  (62%)  and  thus  would 
be  suspect.  The  remaining  objects  are  correctly  classi¬ 
fied  with  a  confidence  of  at  least  69%. 

The  error  case  in  I'nble  IV  w  ould  be  sensed  by  its  low- 
confidence  and  thus  the  associative  memory  would  be 
used.  For  this  case,  the  three  distorted  output  vectors 
vS|,  v„ .,  and  v,  computed  for  this  image  were  fed  to  an 
auto“«oociative  processor  whose  memory  contained 
the  ideal  vector  patterns.  The  output  obtained  from 
the  associative  processor  for  each  v.  input  is  a  linear 
combination  of  the  ideal  stored  vectors.  This  output 
was  thresholded  at  0.5  to  obtain  three  new  output 
vectors.  Our  rule-based  system  was  then  again  ap¬ 
plied  to  these  new  vectors.  The  resulting  decision  in 
this  case  was  correct  (i.e.,  the  image  in  test  4  was  now- 
identified  as  a  tank)  using  rule  3  with  a  confidence  of 
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TaW*  III.  R«*uttt  of  Teal*  Using  the  Twelve  Training  Set  Tank  and  APC 
Image* 

Aspect 


Ted 

number 

view 

ldefc» 

Actual 

class 

Class 

estimate 

Rule 

number 

Confidence 

1 

0 

Tank 

Tank 

1 

1  0 

2 

00 

Tank 

lank 

1 

;t 

12l> 

lank 

Tank 

1 

1  0 

4 

1*0 

'I’ank 

Tank 

1 

1  0 

'» 

24  n 

lank 

Tank 

1 

6 

300 

Tank 

lank 

1 

CO 

7 

0 

AIT 

ABC 

1 

1.0 

8 

60 

AIT 

AIT 

1 

1.(1 

9 

120 

AIT 

AIT 

1 

1.0 

10 

ISO 

AIT 

AIT 

1 

1  0 

1 1 

240 

AIT 

AIT 

1 

!  u 

12 

non 

AIT 

AIT 

1 

1  0 

Table  IV.  Result*  of  Tests  Using  Five  Tank  and  Five  APC  Test  Sst  Images 
with  Two  of  the  Sliteen  Partition*  of  Each  Image  Omitted  (by  Occlusion) 

Aspect 


Test 
numb*  r 

view 

tdeui 

Actual 

cla^ 

Class 

estimate 

Rule 

number 

Confidence 

1 

0 

lank 

lank 

1 

l.n 

2t » 

Tank 

lank 

1 

0.79 

;< 

5n 

lank 

Tank 

1 

0.79 

4 

On 

Tank 

AIT 

1 

0.62 

;> 

1  lo 

lank 

1  anK 

0  7b 

O 

u 

AIT 

AIT 

1 

1.0 

7 

2n 

A 1  ’l ' 

AIT 

! 

0  69 

s 

;>o 

AIT 

AIT 

1 

0.S1I 

u 

MU 

AIT 

AIT 

1 

Of  >9 

0 

1 1" 

AIT 

AIT 

; 

0  69 

0.76.  Examining  the  input  and  output  vectors  from 
the  associative  memory,  we  found  that  thirteen  sym¬ 
bols  were  in  error  prior  to  associative  processing,  and 
that  the  number  of  symbol  errors  was  reduced  to  nine 
t>v  the  associative  processor.  In  this  case,  none  of  the 
symbols  that  were  originally  correct  was  found  to  be  in 
error  after  associative  processing. 

We  have  thus  seen  an  example  when  an  associative 
processor  can  be  used  to  correct  errors  in  the  output 
vectors  v,.  This  occurs  because  the  associative  proces¬ 
sor  effectively  utilizes  all  available  information  to 
make  an  optimal  mathematical  guess.  Unlike  an  asso¬ 
ciative  processor,  the  proposed  rule-based  processor 
only  examines  the  most  reliable  symbols,  and  hence 
ignores  some  information  at  each  rule.  The  flexibility 
of  the  rule-based  processor  is  in  its  ability  to  provide  a 
logical  decision  (along  with  a  confidence  measure) 
even  when  the  input  information  is  incomplete.  This 
was  successfully  demonstrated  in  our  initial  tests  in 
Table  IV.  We  expect  that  the  use  of  autoassociative 
memories  in  a  rule-based  symbolic  processor  will  im¬ 
prove  performance  of  the  system  in  most  cases  (as  long 
as  the  number  of  errors  is  modest,  the  number  of 
classes  is  not  excessive,  and  the  dimensionality  of  the 
symbolic  •  ectors  is  sufficiently  large). 


VII.  Conclusion 

In  this  paper,  we  have  outlined  a  system  capable  of 
recognizing  targets  even  when  parts  of  the  object  are 
not  visible.  Motivation  was  provided  for  filtering  by- 
parts  and  an  example  was  given  to  illustrate  the  possi¬ 
ble  advantages  of  synthesizing  symbolic  correlation 
filters  formed  from  subimages  of  objects.  A  system 
was  devised  and  simulated  for  demonstration  pur¬ 
poses.  Initial  simulation  results  were  encouraging  and 
demonstrated  ,'j-D  distorted  object  recognition  with 
occluded  object  parts.  We  also  showed  that  an  asso¬ 
ciative  processor  can  be  used  in  conjunction  with  the 
rule -based  system  to  improve  performance.  We  de 
tailed  how  the  system’s  rules  are  developed  via  off-line 
interactions  between  the  programmer  and  the  comput¬ 
er.  The  use  of  symbolic  substitution  for  error  compen¬ 
sation  was  also  suggested. 

Further  tests  with  this  concept  and  its  various  as¬ 
pects  are  required.  This  requires  devising  more  ro¬ 
bust  rules.  It  also  includes  further  use  of  the  ability  of 
the  system  to  predict  errors  and  compensate  for  them 
using  multiple  filter  banks.  The  use  of  abductive  rea¬ 
soning  for  developing  the  programs  necessary  for  this 
appears  quite  attractive.  Efficient  methods  of  updat¬ 
ing  the  correlation  filters  on-line  (involving  the  addi¬ 
tion  or  removal  of  training  images)  and  the  memory 
storage  requirements  of  such  a  system  are  other  topics 
for  future  investigations. 

We  acknowledge  the  support  of  different  aspects  of 
this  work  by  various  contractors  (General  Dynamics 
Valiev  Systems  Division,  the  Defense  Advanced  Re¬ 
search  Projects  Agency,  and  the  Air  Force  Office  of 
Scientific  Research). 
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compute  geometric  moments",  SPIE  Proc.,  \  ol.  638,  pp.  32-40,  March-April  1986. 

42.  B.V.K.  Vijaya  Kumar,  "Geometric  moments  from  Hartley  transform  intensities", 
SPIE  Proc.,  Vol.  639.  pp.  253-59,  March-April  1986. 

43.  B.  Montgomery  and  B.V.K.  Vijaya  Kumar,  "Nearest-neighbor  non-iterative  error 
correcting  optical  associative  memory  processor",  SPIE  Proc.,  Vol.  638.  pp.  83-90, 
March-April  1986. 

44.  B.V.K.  Vijaya  Kumar  and  E.  Pochapsky.  "Signal-to-noise  ratio  considerations  in 
modified  matched  spatial  filters".  JOS  A- A.  Vol.  3.  pp.  777-786,  June  1986. 

45.  B.V.K  Vijaya  Kumar  and  C.  Rahenkamp,  "Calculation  of  geometric  moments  using 
Fourier  plane  intensities",  Applied  Optics,  Vol.  25.  pp.  997-1007,  1986 

46.  B.V.K.  Vijaya  Kumar  and  S.  Rajan,  "Subpixel  delay  estimation  using  group-delay 
functions",  SPIE  Proc..  Vol  697.  pp.  187-196,  August  1986. 

47.  B.  Montgomery  and  B.V.K  Vijaya  Kumar,  "Evaluation  of  the  use  of  the  Hopfieid 
neural  net  model  as  a  nearest-neighbor  algorithm",  Applied  Optics,  Vol.  25.  pp. 
3759-66,  15  October  1986. 

48.  B.V.K.  Vijaya  Kumar.  "Minimum  variance  synthetic  discriminant  functions". 
JOSA-A,  Vol.  3,  pp.  1579-84,  October  1986. 

49.  B.V.K.  Vijaya  Kumar,  "Geometric  moments  computed  from  the  Hartley  transform", 
Optical  Engineering.  Vol  25.  pp.  1327-32,  December  1986. 


50.  C.  Rahenkamp  and  B.V.K  Vijaya  Kumar,  "Modifications  to  the  McClellan.  Parks 
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and  Rabiner  computer  program  for  designing  higher  order  differentiating  FIR  filters", 
IEEE  Trans.  ASSP,  Vol.  34,  pp.  1671-7-1,  December  1986. 

51.  D.  Casasent  and  W.  Rozzi,  "Modified  MSF  Synthesis  by  Fisher  and  Mean-Square 
Error  Techniques",  Applied  Optics,  \  ol.  25,  pp.  18-1-187,  15  January  1986. 

52.  D.  Casasent  and  A.J.  Lee,  "A  Feature  Space  Rule-Based  Optical  Relational  Graph 
Processor",  Proc.  SPIE ,  \ ol .  625,  pp.  234-243,  January  1986. 

53.  D.  Casasent,  "Optical  AI  Symbolic  Correlators:  Architecture  and  Filter 

Considerations",  Proc.  SPIE.  Vol.  625,  pp.  220-225,  January  1986. 

54.  D.  Casasent,  S.F.  Xia,  J.Z.  Song,  and  A.J.  Lee,  "Diffraction  Pattern  Sampling  Using 
a  Computer-Generated  Hologram",  Applied  Optics,  Vol.  25,  pp.  983-989,  15  March 
1986. 

55.  D.  Casasent,  "Scene  Analysis  Research:  Optical  Pattern  Recognition  and  Artificial 
Intelligence",  SPIE.  Advanced  Institute  Series  on  Hybrid  and  Optical  Computers, 
Vol.  634.  Leesburg.  Virginia,  March  1986. 

56.  D.  Casasent  and  S.A.  Liebowitz,  "Model-Based  System  for  On-Line  Affine  Image 
Transformations",  Proc.  SPIE,  Vol.  638,  pp.  66-75,  March-April  1986. 

57.  D.  Casasent,  "Optical  Computing  at  Carnegie-Mellon  University",  Optics  News, 
Special  Issue  on  Optical  Computing.  Vol.  12,  pp.  11-13,  April  1986. 

58.  D.  Casasent  and  S.F.  Xia,  "Phase  Correction  of  Light  Modulators",  Optics  Letters , 
Vol.  1 1 .  pp.  398-400,  June  1986. 

59.  D.  Casasent,  "Optical  Artificial  Intelligence  Processors",  IOCC-1986  International 
Optical  Computing  Con ferencc.  Proc.  SPIE,  Vol.  700,  July  1986,  pp.  246-250,  1986. 

60  D.  Casasent,  J.  Jackson  and  G.  \  aerewyck,  "Optical  Array  Processor:  Laboratory 
Results",  IOCC-1986  International  Optical  Computing  Conference,  Proc.  SPIE. 
Vol.  700.  July  1986,  pp.  323-327,  1986. 

61.  D  Casasent  and  W.T.  Chang,  "Correlation  Synthetic  Discriminant  Functions". 
Applied  Optics,  Vol.  25,  pp.  2343-2350,  15  July  1986. 

62.  D.  Casasent  and  B.  Telfer,  "Distortion-Invariant  Associative  Memories  and 
Processors" ,  Proc.  SPIE,  Vol.  697.  August  1986. 

63.  D.  Casasent  and  A.J.  Lee,  "An  Optical  Relational-Graph  Rule-Based  Processor  for 
Structural-Attribute  Knowledge  Bases",  Applie  '  Optics,  Vol.  15,  pp.  3065-3070,  15 
September  1986. 

64.  S.A.  Liebowitz  and  D  Casasent,  "Hierarchical  Processor  and  Matched  Filters  for 
Range  Image  Processing".  Proc.  SPIE.  Vol.  727,  October  1986. 
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65.  A.  Mahalanobis  and  D.  Casasent,  "Large  Class  Iconic  Pattern  Recognition:  An  OCR 
Case  Study",  Proc.  SPIE,  Vol.  726,  October  1986. 

66.  D.  Casasent  and  W.  Rozzi,  "Computer-Generated  and  Phase-Only  Synthetic 
Discriminant  Function  Filters",  Applied  Optics,  Vol.  25,  pp.  3767-3772,  15  October 
1986. 

67.  D.  Casasent,  "Advanced  Optical  Pattern  Recognition  and  Artificial  Intelligence", 
Laser  Institute  of  America,  ICALEO,  November  1986. 

68.  D.  Casasent,  "Optical  Data  Processing",  Article  in  Accent  on  Research ,  p.  23,  1986 
(Carnegie  Mellon  University  Magazine). 

69.  A.  Mahalanobis,  B.Y.K.  Yijaya  Kumar  and  D.  Casasent,  "Spatial-Temporal 
Correlation  Filter  for  In-Plane  Distortion  Invariance",  Applied  Optics,  Vol.  25,  pp. 
4466-4472.  1  December  1986. 

70.  J.  Filler,  D  Casasent  and  C.P.  Neuman,  "Factorized  Extended  Kalman  Filter  for 
Optical  Processing",  Applied  Optics,  Vol.  25,  pp.  1615-1621,  15  May  1986. 


12.1.2  PAPERS  PUBLISHED  UNDER  AFOSR.  SUPPORT  IN  THE  PRESENT 
|  TIME  PERIOD  ( JANUAfRY-DECEMBER  1987) 


71.  D.  Casasent  and  R  Krishnapuram.  "Detection  of  Target  Trajectories  Using  the 
Hough  Transform",  Applied  Optics,  Vol.  26,  pp.  217-251,  15  January  1987. 

72.  E.  Baranoski  and  D.  Casasent.  "A  Directed  Graph  Optical  Processor",  Proc.  SPIE. 
Vol  752,  pp.  58-71,  January  1987. 

73.  D.  Casasent,  A.  Mahalanobis  and  S.A.  Liebowitz,  "Parameter  Selection  for  Iconic  and 
Symbolic  Pattern  Recognition  Filters".  Proc.  SPIE.  Vol.  754.  pp.  284-303,  January 
1987. 

74.  D.  Casasent  and  J.H  Song.  "1-D  Acousto  Optic  Processing  of  2-D  Image  Data", 
Proc.  SPIE.  Vol.  754.  pp.  64-73,  January  1987. 

75.  D.  Casasent,  "Optica!  Pattern  Recognition  and  Artificial  Intelligence:  A  Review", 
Proc.  SPIE.  Vol.  754,  pp.  2-11,  January  1987. 

76.  D.  Casasent,  "Optical  Pattern  Recognition  and  A1  Algorithms  and  Architectures  for 
ATR  and  Computer  Vision",  Proc.  SPIE,  Vol.  755.  pp.  83-93.  January  1987. 

77.  D.  Casasent,  "Electro  Optic  Target  Detection  and  Object  Recognition",  Proc.  SPIE. 
Vol.  762.  pp.  104-125,  January  1987. 
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78.  A.J.  Lee  and  D.  Casasent,  "Computer  Generated  Hologram  Recording  Using  a  Laser 
Printer",  Applied  Optics ,  Vol.  26,  pp.  136-138,  1  January  1987. 

79.  D.  Casasent  and  E.  Botha,  "Knowledge  in  Optical  Symbolic  Pattern  Recognition 
Processors",  Optical  Engineering,  Special  Issue  on  Optical  Computing  and 
Nonlinear  Optical  Signal  Processing,  Vol.  26,  pp.  3-4-40,  January  1987. 

80.  D.  Casasent,  “Optical  Information  Processing",  Sixth  Edition,  Encyclopedia  of 
Science  and  Technology,  McGraw-Hill. 

81.  D.  Casasent  and  R.  Krishnapuram,  "Curved  Object  Location  by  Hough 

Transformations  and  Inversions",  Pattern  Recognition,  Vol.  20,  No.  2,  pp.  181-188, 

1987. 

82.  D.  Casasent.  S.F.  Xia,  A.J.  Lee  and  J.Z.  Song,  "Real-Time  Deformation  Invariant 
Optical  Pattern  Recognition  Using  Coordinate  Transformations",  Applied  Optics, 
Vol.  26.  pp.  938-942,  15  March  1987. 

83.  S.A.  Liebowitz  and  D.  Casasent,  "Error  Correction  Coding  in  an  Associative 
Processor",  Applied  Optics.  Vol.  26.  pp.  999-1006,  15  March  1987. 

84.  D.  Casasent  and  A.  Mahalanobis,  "Rule-Based,  Probabilistic,  Symbolic  Target 

Classification  by  Object  Segmentation",  OSA  Topical  Meeting  on  Optical 

Computing  (March  1987),  Technical  Digest  Series  1987,  Vol .  1 1  (Optical  Society  of 
America,  Washington,  D.C.,  1987),  pp.  155-158. 

85.  D.  Casasent  and  S.A.  Liebowitz,  "Model-Based  Knowledge-Based  Optical 

Processors",  Applied  Optics.  Vol.  26.  pp.  1935-1942,  15  May  1987. 

86.  R.  Krishnapuram  and  D.  Casasent,  "Hough  Space  Transformations  for 
Discrimination  and  Distortion  Estimation",  Computer  \rision,  Graphics,  and  Image 
Processing.  Vol.  38,  pp.  299-316,  February  1987. 

87.  D.  Casasent  and  A.  Mahalanobis,  "Optical  Iconic  Filters  for  Large  Class 
Recognition",  Applied  Optics,  Vol.  26,  pp.  2266-2273,  1  June  1987. 

88.  R.  Krishnapuram  and  D.  Casasent,  "Optical  Associative  Processor  for  General  Linear 
Transformation",  Applied  Optics,  Vol.  26,  pp.  3641-3648,  1  September  1987. 

89.  A.  Mahalanobis,  B.V  K.  V ijaya  Kumar  and  D.  Casasent,  "Minimum  Average 
Correlation  Energy  (\L\CE)  Filters",  Applied  Optics.  Vol.  26,  pp.  3633-3640,  1 
September  1987. 

90.  D.  Casasent  and  B.  Telfer,  "Associative  Memory  Synthesis,  Performance,  Storage 
Capacity  and  Updating.  New  Heteroassociative  Memory  Results",  Proc.  SPIE.  Vol. 
848,  November  1987. 
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91.  D.  Casasent  and  S.I.  Chien,  "Rule-Based  String  Code  Processor",  Proc.  SPIE.  VoL 
8-18,  November  1987. 

92.  D.  Casasent  and  A.  Mahalanobis,  "Rule-Based  Symbolic  Processor  for  Object 
Recognition”,  Applied  Optics,  Vol.  26,  pp.  4795-4802,  15  November  1987. 


12.1.3  BOOK  EDITING  AND  BOOK  CHAPTERS 

1.  Intelligent  Robots  and  Computer  Vision,  Ed.  D.  Casasent,  SPIE,  Vol.  726,  October 
1986. 

2.  Hybrid  Image  Processing.  Ed.  D.  Casasent  and  A.  Tescher,  SPIE,  Vol.  638,  April 
1986. 

3.  "Optical  Feature  Extraction",  D.  Casasent,  Chapter  in  Optical  Signal  Processing,  pp. 
75-95,  Ed.  by  J  L.  Horner,  Pub.  bv  Academic  Press,  San  Diego,  1987. 

4.  "Optical  Linear  Algebra  Processors",  D.  Casasent  and  B.V.K.  Vijava  Kumar, 
Chapter  in  Optical  Signal  Processing,  pp.  389-407,  Ed.  by  J.L.  Horner,  Pub.  by 
Academic  Press,  San  Diego,  1987. 

12.2  PRESENTATIONS  GIVEN  ON  AFOSR  RESEARCH 
(AUGUST  1984-DATE) 

September  1984 

1.  Philips  Research  Laboratories  -  Briarcliff,  NY  -  "Optics  and  Pattern  Recognition  in 
Robotics" . 

2.  Optical  Society  of  America  -  Pittsburgh.  PA,  "CMU  Center  for  Excellence  in  Optical 
Data  Processing". 

3.  Carnegie-Mellon  University,  ECE  Graduate  Seminar  -  Pittsburgh,  PA,  "Optical 
Processing  Research  in  the  Center  for  Excellence  in  Optical  Data  Processing". 

4.  Westinghouse  Corporation  -  Baltimore,  NlD,  "Research  and  Facilities  in  the  Center 
for  Excellence  in  Optical  Data  Processing". 

October  1984 

5.  Washington,  D  C.,  "Optical  Pattern  Recognition.  Feature  Extraction". 

6.  Washington.  D  C.,  "Optical  Pattern  Recognition:  Correlators". 

7.  W'ashingtcn,  D.C.,  "Synthetic  Discriminant  Function  Case  Studies". 

8.  Washington,  D.C.,  "Basic  Optical  Signal  Processing  Architectures  and  Algorithms". 


9.  W'ashington, 
AUorithms". 

DC, 

"Advanced  Optical 

Signal 

Processing 

Architectures 

and 

10.  Washington, 
Architectures" 

DC., 

"Optical  Linear 

Algebra 

Processor 

Algorithms 

and 
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11.  Washington,  D.C.,  "Optical  Linear  Algebra  Processor  Applications  and  High- 
Accuracy  Architectures". 

12.  Carnegie-Mellon  University,  ECE  Sophomore  Seminar  -  Pittsburgh,  Pennsylvania, 
"Research  in  the  Center  for  Excellence  in  Optical  Data  Processing". 

13.  University  of  Pittsburgh,  Center  for  Multivariate  Analysis  -  Pittsburgh,  PA, 
"Advanced  Multi-Class  Distortion-Invariant  Pattern  Recognition". 

14.  Wright  Patterson  Air  Force  Base  -  Ohio,  "Multi-Functional  Optical  Signal  Processor 
for  Electronic  Warfare". 

15.  George  Mason  University  -  Washington,  D.C.,  “Optical  Information  Processing". 

16.  SPIE  (IOCC)  Conference  -  Boston,  Massachusetts,  "Optimal  Linear  Discriminant 
Functions". 

November  1984 

17.  SPIE  Robotics  Conference  -  Boston,  MA,  "Chord  Distributions  in  Pattern 
Recognition  ". 

18.  University  of  Maryland  -  "Optical  Processing  for  Autonomous  Land  Vehicle 
Navigation  " . 

January  1985 

19.  Fairchild  Weston  -  Long  Island,  NY,  "Optical  Pattern  Recognition  and  Optical 
Processing" . 

20.  SPIE  Conference  -  Los  Angeles,  CA,  "Hybrid  Optical/Digital  Image  Pattern 
Recognition.  A  Review". 

21.  SPIE  Conference  -  Los  Angeles,  CA,  "A  Computer  Generated  Hologram  for 
Diffraction-Pattern  Sampling". 

22.  SPIE  Conference  -  Los  Angeles,  CA,  "A  Recent  Review  of  Holography  in  Coherent 
Optical  Pattern  Recognition". 

23.  Sandia  National  Laboratories  -  Albuquerque,  NM,  "Optical  Pattern  Recognition  and 
Optical  Processing". 

February  1985 

24.  NASA  Lewis  -  Cleveland,  Oil,  "Optical  Linear  Algebra  Processors  (Systolic)". 

March  1985 

25.  George  Washington  University,  -  Washington,  D  C.,  "Optical  Linear  Algebra  for 

SDI". 

26.  Lockheed  Missiles  A  Space  Co.  -  Sunnyvale,  CA,  "Advanced  Hybrid  Optical/Digital 
Pattern  Recognition" 

27.  OSA  Topical  Meeting  on  Optical  Computing  -  Lake  Tahoe,  NV,  "Fabrication  and 
Testing  of  a  Space  and  Frequency-Multiplexed  Optical  Linear  Algebra  Processor". 

28.  OSA  Topical  Meeting  on  Machine  Vision  -  Lake  Tahoe,  NV,  "Hierarchical  Feature- 
Based  Object  Identification". 

29.  OSA  Topical  Meeting  on  Machine  Vision  -  Lake  Tahoe,  NV,  "Correlation  Filters  for 
Distortion-Invariance  and  Discrimination". 

30.  Texas  Instruments  -  Dallas,  TX,  "Optical  Pattern  Recognition". 

April  1985 

31.  Electro-Corn  Automation,  Inc.  -  Dallas,  TX,  "Optical  Pattern  Recognition". 

32.  Eglin  Air  Force  Base  -  Ft.  Walton  Beach,  FL,  "Optical  Pattern  Recognition  and 
Kalman  Filtering". 
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May  1985 

33.  Carnegie-Mellon  University  -  Board  of  Trustees,  "Optical  Data  Processing". 

August  1985 

34.  SPIE  -  San  Diego,  CA,  "Correlation  Synthetic  Discriminant  Functions  for  Object 
Recognition  and  Classification  in  High  Clutter". 

35.  SPIE  -  San  Diego,  CA,  "A  Factorized  Extended  Kalman  Filter". 

September  1985 

36.  SPIE  -  Cambridge,  MA.  "Parameter  Estimation  and  In-Plane  Distortion  Invariant 
Chord  Processing". 

37.  SPIE  -  Cambridge,  MA,  "Optical  Processing  Techniques  for  Advanced  Intelligent 
Robots  and  Computer  Vision". 

38.  SPIE  -  Cambridge,  MA,  "High-Dimensionality  Feature-Space  Processing  with 
Computer  Generated  Holograms". 

October  1985 

39.  SDI  -  Washington.  D  C..  "Optical  Data  Processing  for  SDI". 

40.  Martin  Marietta  -  Denver,  CO,  "Optical  Data  Processing". 

November  1985 

41.  IEEE  Computer  Society.  Workshop  on  Computer  Architectures  for  Pattern  Analysis 
and  Image  Database  Management  -  Miami  Beach,  FL,  "Optical  Computer 
Architectures  for  Pattern  Analysis". 

January  1986 

42.  SPIE  Engineering  Update  Series,  "Fourier  Optics  for  Electrical  Engineers"  -  Los 
Angeles.  CA. 

43.  SPIE  Engineering  Update  5  ries,  "Optical  Data  Processing",  Los  Angeles,  CA. 

44.  SPIE  Conference  -  Los  Angeles.  CA,  "A  Feature  Space  Rule-Based  Optical  Relational 
Graph  Processor". 

45.  SPIE  Conference  -  Los  Angeles.  CA,  "Optical  Linear  Algebra  Processors: 
Architectures  and  Algorithms". 

46.  SPIE  Conference  -  Los  Angeles.  CA,  "Optical  AI  Symbolic  Correlators:  Architecture 
and  Filter  Considerations". 

47.  Optical  Society  of  America  -  Los  Angeles,  CA,  "Optical  Computing". 

48.  Corporate  Advisory  Group  on  Optical  Information  Processing  -  Los  Angeles,  CA, 
"Optical  Computing". 

49.  Jet  Propulsion  Laboratory /NASA  -  Pasadena,  CA.  "Optical  Linear  Algebra  and 
Pattern  Recognition  Processors". 

February  1986 

50.  Computer  Science  Department,  Carnegie-Mellon  University  -  Pittsburgh.  PA, 
"Optical  AI  Pattern  Recognition  Research  in  ECE". 

March  1986 

51  Carnegie-Mellon  University,  Professional  Education  Program  -  Pittsburgh, 
Pennsylvania,  "Optical  Data  Processing". 

52.  Air  Force  Institute  Conference  of  Technology  -  Dayton,  Ohio,  "Optical  Data 
Processing  at  Carnegie-Mellon  University". 

53.  Mars  Electronics  -  Philadelphia,  PA,  "Optical  Pattern  Recognition". 

54  SPIE  Advanced  Institute  Series  on  Hybrid  and  Optical  Computers  -  Leesburg, 
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Virginia,  "Scene  Analysis  Research:  Optical  Pattern  Recognition  and  Artificial 
Intelligence" . 

April  1986 

55.  SPIE  Conference  -  Orlando,  FL,  “Model-Based  System  for  On-Line  Affine  Image 
Transformations" . 

56  Robotics  Institute  -  Carnegie-Mellon  University  -  Pittsburgh,  PA,  "Optical  AI 
Pattern  Recognition  Research  in  ECE". 

May  1986 

57.  IBM,  Federal  Systems  Division  -  Manassas,  VA,  "Optical  Computing". 

58.  General  Electric  -  Philadelphia,  PA,  "Adaptive  Optical  Processing". 

59.  Litton  Data  Systems  -  Van  Nuys,  CA,  "Multiple  Degree  of  Freedom  Pattern 
Recognition". 

60.  Rockwell  Corporation  -  Seal  Beach,  CA,  "Optical  Signal  Processing". 

61.  NASA  Jet  Propulsion  Laboratory.  California  Institute  of  Technology  -  Pasadena,  CA. 
"Multiple  Degree  of  Freedom  Optical  Pattern  Recognition". 

62.  SPIE  Engineering  Update  Series,  "Fourier  Optics  and  Components  for  Electrical 
Engineers"  -  Los  Angeles,  CA. 

63.  Philip  Morris  Corporation  -  Richmond,  YA,  "Applications  of  Optica1  Data  Processing 
to  Automated  Inspection". 

June  1986 

64.  Carnegie-Mellon  University,  Professional  Education  Program  -  Pittsburgh,  PA, 

“Optical  Pattern  Recognition". 

65.  Carnegie-Mellon  University,  Professional  Education  Program  -  Pittsburgh.  PA. 

"Optical  Signal  Processing". 

66.  SPIE  Engineering  Update  Series,  "Fourier  Optics  and  Components  for  Electrical 
Engineers"  -  Tufts  University,  Boston.  MA.  Boston,  MA  -  "Operations  Achievable". 

67.  University  of  Pretoria  -  Pretoria,  South  Africa,  "Optical  Data  Processing". 

July  1986 

68.  IOCC  Conference  -  Jerusalem.  Israel,  "Optical  Artificial  Intelligence  Processors". 

August  1986 

69.  SPIE  Conference  -  San  Diego,  CA.  "Distortion-Invariant  Associative  Processors". 

September  1986 

70.  ALCOA  -  Pittsburgh,  PA,  "Optical  Information  Processing". 

71  General  Electric  -  Philadelphia,  PA.  "Optical  Processing". 

72.  Eikonix  Corp.  -  Boston,  MA,  "Optical  Pattern  Recognition  for  Optical  Character 
Recognition  " . 

73.  Penn  State  University  -  State  College,  PA,  "Optica)  Scene  Analysis  and  Artificial 
Intelligence  " . 

October  1986 

74.  Advanced  Technology  Inti.  -  Boston,  MA,  "Optical  Information  Processing". 

75  Advanced  Technology  Inti.  -  Orlando,  FL.  "Optical  Information  Processing". 

76.  Advanced  Technology  Inti.  -  Washington.  D  C.,  "Optical  Information  Processing" 

It  ■  Carnegie-Mellon  University,  Professional  Education  Program  (presented  to  IBM)  - 
Pittsburgh,  Pennsylvania,  "Optical  Pattern  Recognition". 

78  Carnegie-Mellon  University.  Professional  Education  Program  (presented  to  IBM)  - 
Pittsburgh.  Pennsylvania.  "Optical  Data  Processing". 
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79.  SPIE  Conference  -  Boston,  MA,  "Hierarchical  Processor  and  Matched  filters  for 
Range  Image  Processing". 

80.  SPIE  Conference  -  Boston,  MA,  "Large  Class  Iconic  Pattern  Recognition:  An  OCR 
Case  Study". 

81.  Carnegie  Mellon  University,  ECE  Graduate  Seminar  -  Pittsburgh,  PA,  "Optical 
Computing  in  ECE.  1986". 

November  1986 

82.  ICALEO’86  -  Arlington.  VA,  "Advanced  Optical  Pattern  Recognition  and  Artificial 
Intelligence". 

83.  Optical  Society  of  America  (San  Diego  Chapter)  -  San  Diego,  CA,  "Optical 
Computing" . 

December  1986 

84.  Philip  Morris  -  Richmond.  VA.  "Optical  Pattern  Recognition  for  Inspection  and 
Robot  ics" . 

85.  ORD  -  Washington,  D  C..  "Optical  Computing  Accomplishments". 

January  1987 

86.  SPIE  Conference  -  Los  Angeles.  CA.  "A  Directed  Graph  Optical  Processor". 

87.  SPIE  Conference  -  Los  Angeles.  CA.  "Complex  Data  Handling  in  Analog  and  High- 
Accuracy  Optical  Linear  Algebra  Processors". 

88.  SPIE  Conference  -  Los  Angeles.  CA,  "Parameter  Selection  for  Iconic  and  Symbolic 
Pattern  Recognition  Filters". 

89.  SPIE  Conference  -  Los  Angeles.  C’A.  "1-D  Acousto  Optic  Processing  of  2-D  Image 
Data" . 

90.  SPIE  Conference  -  Los  Angeles.  CA.  "Optical  Pattern  Recognition  and  Artificial 
Intelligence:  A  Review"  (Invited  Keynote  Speaker). 

91.  SPIE  Conference  -  Los  Angeles,  CA.  "Optica!  Pattern  Recognition  and  AI  Algorithms 
and  Architectures  for  ATR  and  Computer  Vision"  (Invited). 

92.  SPIE  Conference  -  Los  Angeles.  CA.  "Electro  Optic  Target  Detection  and  Object 
Recognition"  (Invited). 

93.  Workshop  on  Space  Telerobotics  -  NASA  JPL.  Pasadena.  CA.  "Multiple  Degree  of 
Freedom  Optical  Pattern  Recognition". 

94  Hewlett  Packard  -  Palo  Alto.  CA.  "Optical  Computing" 

February  1987 

95.  ISC  Defense  Systems,  Inc.  -  Lancaster,  PA,  "Optical  Computing  and  Signal 
Processing" . 

96.  DARPA  -  Washington.  D  C  ,  "Optical  Computing:  A  Review" 

March  1987 

97  Advanced  Technology  Inti..  Short  Course  -  Los  Angeles,  CA,  "Optical  Information 
Processing" . 

98.  Advanced  Technology  Inti.  Short  Course  -  San  Diego.  C'A.  "Optical  Information 
Processing" . 

99.  Advanced  Technology  I n t ! . ,  Short  Course  -  Anaheim.  CA.  "Optical  Information 
Processing" . 

100  Advanced  Technology  Inti..  Short  Course  -  Palo  Alto.  CA.  "Optical  Information 
Processing" 

101.  Aerospace  Corporation  -  Los  Angeles,  CA.  "Optical  Computing  and  Signal 
Processing  Research  at  CM! 
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102.  OS  A  Topical  Meeting  on  Optical  Computing  -  Lake  Tahoe,  N\  ,  "Rule-Based. 
Probabilistic,  Symbolic  Target  Classification  by  Object  Segmentation 11 

May  1987 

103.  NASA  Langlev  Research  Center  -  Hampton,  \  A.  “Machine  Vision". 

June  1987 

101.  Perkin-Elmer  -  White  Plains,  NY.  "Optical  Computing". 

July  1987 

105.  Carnegie  Mellon  University  -  ECE  Department,  Presentation  to  the  attendees  of  the 
Fault  Tolerant  Computing  Conference,  Pittsburgh.  PA. 

August.  1987 

106  ECLA  Extension  Course  -  Los  Angeles,  CA.  "Optical  Computing". 

107.  Mathematical  Modeling  Conference  -  St.  Louis.  MO.  "Computations  with  Optical 
Computers  " . 

108.  TRW  -  Los  Angeles.  CA.  "Optical  Data  Processing  of  Synthetic  Aperture  Radar 
Signals  for  Pattern  Recognition". 

109.  Galileo  -  Sturbridge,  MA.  "Product  Opportunities  in  Optical  Data  Processing" 

110  General  Electric  -  Valley  Forge.  PA.  "Recent  Progress  in  Adaptive  Optical  Data 
Processing" . 

September  1987 

111.  Defense  Science  Board.  Pentagon  -  Washington.  DC.  "Optical  Computing  h  r 
Automatic  Target  Recognition". 

October  1987 

112.  A I AA  Computers  in  Aerospace  VI  Conference  -  Boston.  MA.  "Multi-functional 
Optical  Logic.  Numerical  and  Pattern  Recognition  Processor". 

113.  Philip  Morris  Corporation  -  Richmond.  VA.  "Optical  Processing  for  Product 
Inspection 11 . 

November  1987 

11  1.  SP1E  Robotics  Conference  -  Boston.  MA.  "Associative  Memory  Synthesis, 
Performance.  Storage  Capacity  and  Updating:  New  I  leteroassociat  ive  Memory 
Results" . 

11a.  SPIE  Robotics  Conference  -  Boston.  MA  "Rule-Based  String  Code  Processor” . 

116.  SPIE  Robotics  Conference  -  Boston.  MA.  "Model-Based  Satellite  Acquisition  and 
Tracking". 

117.  SPIE  Robotics  Conference  -  Boston,  MA.  "Optical  Processor  for  Product  Inspection". 

118.  SPIE  Robotics  Conference  -  Boston.  MA.  "Optical  Feature  Extraction  for  High-Speed 
Inspect  ion  " . 

119.  SPIE  Robotics  Conference  -  Boston.  MA.  "Multi-Sensor  Processing  Object  Detection 
and  Identification" 

December  1987 

120.  National  Security  Agency  -  Maryland,  "Optical  Information  Processing". 
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12.3  THESES  SUPPORTED  BY  AFOSR  FUNDING 
(SEPTEMBER  1984-DATE) 

1.  Eugene  Pochapsky.  M.S.  Dissertation,  "The  Simulation  of  Optical  Pattern 
Recognition  Systems",  September  1984. 

2.  William  Rozzi,  M.S.  Dissertation,  "Advanced  Quantitative  Synthetic  Discriminant 
Function  Tests  on  Ship  Imagery",  December  1981. 

3.  James  Fisher,  M.S.  Dissertation,  "Extended  Kalman  f  ilter  Algorithms  for 
Implementation  on  a  High-Accuracy  Optical  Processor",  December  1984 

4.  W.T.  Chang,  Ph  D.  Dissertation.  "Chord  Distributions  and  Correlation  SDI  s  in 
Pattern  Recognition".  March  198V 

5.  Andrew  J  Lee.  M.S.  Dissertation.  "High-Dimensionality  Feature  Space  Pattern 
Recognition  l  sing  Computer  Generated  Holograms",  January  1980. 

6.  Abhijit  Maha hnobis,  M.S.  Dissertation.  "Application  of  Synthetic  Discriminant 
Functions  for  Optical  Character  Recognition".  September  198a. 

7.  Jeltrey  Richards.  M.S.  Dissertation.  "Optical  Processing  for  Product  Inspection". 
November  1986. 

8.  Brian  1  elfer.  M.S.  Dissertation,  "Optical  Associative  Memories  for  Distortion- 
Invariant  Pattern  Recognition".  February  1987. 

9  Abhijit  Mahalanobis,  Ph  D  Dissertation.  "New  Correlation  Filters  for  Symbolic 
Rule-Based  Pattern  Recognition".  August  1987 

10.  Hugh  H  ram  Krishnapuram,  PhD  Dissertation.  "Hough  Space  Associative  Process*.; 
for  Pattern  Recognition",  August  1987 


